Abstract
Web-based educational systems routinely collect vast quantities of data on students’ e-activity generating log files that offer researchers unique opportunities to apply data mining techniques and discover interesting information to improve the learning process. This paper proposes a friendly and intuitive tool called DRAL to detect the most relevant e-activities that a student needs to pass a course based on features extracted from logged data in an education web-based system. The method uses a more flexible representation of the available information based on multiple instance learning to prevent the appearance of a great number of missing values and is based on a multi-objective grammar guided genetic programming algorithm which obtains simple and clear classification rules which are markedly useful to identify the number, type and time of e-activities more relevant so that a student has a high probability to pass a course. To validate this approach, our proposal is compared with the most traditional proposals in multiple instance learning over the years. Experimental results demonstrate that the approach proposed successfully improves the accuracy of previous models by finding a balance between specificity and sensitivity values.
Similar content being viewed by others
References
Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: NIPS’02: proceedings of neural information processing system. Vancouver, Canada, pp 561–568
Ardila A (2001) Predictors of university academic performance in Colombia. Int J Educ Res 35:411–417
Auer P, Ortner R (2004) A boosting approach to multiple instance learning. In: ECML’04: Proceedings of the 5th European Conference on Machine Learning. Lecture Notes in Computer Science, vol 3201, Pisa, Italy, pp 63–74
Bekele R, Menzel W (2005) A bayesian approach to predict performance of a student (bapps): a case with ethiopian students. Artif Intell Appl 22:189–194
Belanger F, Jordan DH (2000) Evaluation and implementation of distance learning: technologies, tools and techniques. Idea Group, Hershey
Busato V, Prins F, Elshout J, Hamaker C (2000) Intellectual ability, learning style, personality, achievement motivation and academic success of psychology students in higher education. Pers Individ Differ 29:1057–1068
Cen H, Koedinger KR, Junker B (2006) Learning factors analysis a general method for cognitive model evaluation and improvement, vol 4053. Springer, Berlin
Chadwick SA (1999) Teaching virtually via the web: comparing student performance and attitudes about communication in lecture, virtual web-based, and web-supplemented courses. Electron J Commun 9:1–13
Chai YM, Yang ZW (2007) A multi-instance learning algorithm based on normalized radial basis function network. In: ISSN’07: proceedings of the 4th international symposium on neural networks. Lecture Notes in Computer Science, vol 4491, Nanjing, China, pp 1162–1172
Chen X, Zhang C, Chen S, Rubin S (2009) A human-centered multiple instance learning framework for semantic video retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev 39(2):228–233
Chevaleyre Y, Bredeche N, Zucker J (2002) Learning rules from multiple instance data: Issues and algorithms. In: IPMU’02: proceedings of 9th information processing and management of uncertainty in knowledge-based systems, Annecy, France, pp 455–459
Chevaleyre YZ, Zucker JD (2001) Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem. In: AI’01: proceedings of the 14th of the Canadian society for computational studies of intelligence, Lecture Note in Computer Science, vol 2056, Ottawa, Canada, pp 204–214
Chidolue M (2001) The relationship between teacher characteristics, learning environment and student achievement and attitude. Stud Educ Eval 22(3):263–274
Coello CA, Lamont GB, Veldhuizen DAV (2007) Evolutionary algorithms for solving multi-objective problems. Genetic and evolutionary computation, 2nd edn. Springer, Berlin
Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimisation: NSGA-II. In: PPSN VI: proceedings of the 6th international conference on parallel problem solving from nature. Springer, London, UK, pp 849–858
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 17:1–30
Dietterich TG, Lathrop RH, Lozano-Perez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71
Fausett L, Elwasif W (1994) Predicting performance from test scores using backpropagation and counterpropagation. In: WCCI’94: IEEE world congress on computational intelligence, Washington, USA, pp 3398–3402
Gao S, Suna Q (2008) Exploiting generalized discriminative multiple instance learning for multimedia semantic concept detection. Pattern Recogn 41(10):3214–3223
Garcia-Piquer A, Fornells A, Orriols-Puig A, Corral G, Golobardes E (2011) Data classification through an evolutionary approach based on multiple criteria. Knowl Inf Syst (in press). doi:10.1007/s10115-011-0462-9
Gartner T, Flach PA, Kowalczyk, A., Smola AJ (2002) Multi-instance kernels. In: ICML’02: proceedings of the 19th international conference on machine learning. Morgan Kaufmann, Sydney, Australia, pp 179–186
Gu Z, Mei T, Tang J, Wu X, Hua X (2008) Milc2: A multi-layer multi-instance learning approach to video concept detection. In: MMM’08: proceedings of the 14th international conference of multimedia modeling, Kyoto, Japan, pp 24–34
Herman G, Ye G, Xu J, Zhang B (2008) Region-based image categorization with reduced feature set. In: Proceedings of the 10th IEEE workshop on multimedia signal processing, Cairns, QLD, pp 586–591
Hong Y, Kwong S (2009) Learning assignment order of instances for the constrained k-means clustering algorithm. IEEE Trans Syst Man Cybern Part B Cybern 39(2):568–574
Huang H, Hsu C (2002) Bayesian classification for data from the same unknown class. IEEE Trans Syst Man Cybern Part B Cybern 32(2):137–145
Jantan H, Hamdan AR, Othman ZA (2010) Classification and prediction of academic talent using data mining techniques. In: KES’10: proceedings of the 14th international conference on knowledge-based and intelligent information and engineering systems: part I. Springer, Berlin, pp 491–500
Keerthi S, Shevade S, Bhattacharyya C, Murthy K (2001) Improvements to platt’s SMO algorithm for svm classifier design. Neural Comput 13(3):637–649
Kotsiantis S, Pintelas P (2005) Predicting students marks in hellenic open university. In: ICALT’05: the 5th international conference on advanced learning technologies, Kaohsiung, Taiwan, pp 664–668
Kouchakpour P, Zaknich A, Brunl T (2009) A survey and taxonomy of performance improvement of canonical genetic programming. Knowl Inf Syst 21:1–39. doi:10.1007/s10115-008-0184-9
Luengo J, Garca S, Herrera F (2011) On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl Inf Syst (in press). doi:10.1007/s10115-011-0424-2
Luna J, Romero J, Ventura S (2011) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst (in press). doi:10.1007/s10115-011-0419-z
Majid A, Lee CH, Mahmood M, Choi TS (2011) Impulse noise filtering based on noise-free pixels using genetic programming. Knowl Inf Syst (in press). doi:10.1007/s10115-011-0456-7
Marcano-Cedeo A, Quintanilla-Domnguez J, Andina D (2011) Breast cancer classification applying artificial metaplasticity algorithm. Neurocomputing 74(8):1243–1250
Maron O, Lozano-Pérez T (1997) A framework for multiple-instance learning. In: NIPS’97: proceedings of neural information processing system 10, Denver, Colorado, USA, pp 570–576
Martnez D (2001) Predicting student outcomes using discriminant function analysis. In: Annual meeting of the research and planning group, California, USA, pp 163–173
Minaei-Bidgoli B, Punch W (2003) Using genetic algorithms for data mining optimization in an educational web-based system. Genet Evol Comput 2:2252–2263
Moallem M (2001) Applying constructivist and objectivist learning theories in the design of a web-based course: implications for practice. Educ Technol Soc 4:113–125
Nguyen TN, Paul J, Peter H (2007) A comparative analysis of techniques for predicting academic performance. IEEE Xplore, pp 7–12
Oommen BJ, Hashem MK (2009) Modeling a student’s behavior in a tutorial-like system using learning automata. IEEE Trans Syst Man Cybern Part B Cybern (in press)
Pang J, Huang Q, Jiang S (2008) Multiple instance boost using graph embedding based decision stump for pedestrian detection. In: ECCV’08: proceedings of the 10th European conference on computer vision. Lecture Note in Computer Science, vol 5305. Springer, Berlin, pp 541–552
Pao HT, Chuang SC, Xu YY, Fu H (2008) An EM based multiple instance learning method for image classification. Expert Syst Appl 35(3):1468–1472
Pappa G, Freitas A (2009) Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl Inf Syst 19:283–309. doi:10.1007/s10115-008-0171-1
Qi X, Han Y (2007) Incorporating multiple svms for automatic image annotation. Pattern Recogn 40(2):728–741
Rice WH (2006) Moodle e-learning course development. Pack Publishing, Birmingham
Romero C, Espejo P, Zafra A, Romero J, Ventura S (2011) Web usage mining for predicting final marks of students that use moodle courses. Comput Appl Eng Educ J (accepted)
Romero C, Gonzalez P, Ventura S, del Jesus M, Herrera F (2009) Evolutionary algorithms for subgroup discovery in e-learning: a practical application using moodle data. Expert Syst Appl 36(2):1632–1644
Romero C, Ventura S (2010) Educational data mining: a review of the state-of-the-art. IEEE Trans Syst Man Cybern Part C Appl Rev 40(6):610–618
Shi Y (2010) Multiple criteria optimization-based data mining methods and applications: a systematic survey. Knowl Inf Syst 24:369–391. doi:10.1007/s10115-009-0268-1
Sikora M (2011) Induction and pruning of classification rules for prediction of microseismic hazards in coal mines. Expert Syst Appl 38(6):6748–6758
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Superby J, Vandamme J, Meskens N (2006) Determination of factors influencing the achievement of the first-year university students using data mining methods. In: EDM’06: workshop on educational data mining, Hong Kong, China, pp 37–44
Ventura S, Romero C, Zafra A, Delgado JA, Hervás C (2007) JCLEC: a java framework for evolutionary computation. Soft Comput 12(4):381–392
Wang H, Wang S (2010) Mining incomplete survey data through classification. Knowl Inf Syst 24:221–233. doi:10.1007/s10115-009-0245-8
Wang J, Zucker JD (2000) Solving the multiple-instance problem: a lazy learning approach. In: ICML’00: proceedings of the 17th international conference on machine learning, Standord, CA, USA, pp 1119–1126
Whigham PA (1995) Grammatically-based genetic programming. In: Proceedings of the workshop on genetic programming: from theory to real-world applications, Tahoe City, California, USA, pp 33–41
Witten I, Frank E (2005) Data Mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Xu X (2003) Statistical learning in multiple instance problems. Ph.D. thesis, Department of Computer Science. University of Waikato, Hamilton, New Zealand
Xu X, Frank E (2004) Logistic regression and boosting for labeled bags of instances. In: PAKDD’04: proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining, Lecture Notes in Computer Science, vol 3056, Sydney, Australia, pp 272–281
Zafra A, Gibaja E, Ventura S (2011) Multi-instance learning with multi-objective genetic programming for web mining. Appl Soft Comput 11(1):93–102
Zafra A, Romero C, Ventura S (2011) Multiple instance learning for classifying students in learning management systems. Expert Syst Appl 38(12):15020–15031
Zafra A, Ventura S (2010) G3P-MI: a genetic programming algorithm for multiple instance learning. Inf Sci 180(23):4496–4513
Zafra A, Ventura S, Romero C, Herrera-Viedma E (2009) Multi-instance genetic programming for web index recommendation. Expert Syst Appl 36:11470–11479
Zhang ML, Zhou ZH (2009) Multi-instance clustering with applications to multi-instance prediction. Appl Intell 31:47–68
Zhang Q, Goldman S (2001) EM-DD: an improved multiple-instance learning technique. In: NIPS’01: proceedings of neural information processing system 14, Vancouver, Canada, pp 1073–1080
Zhou ZH, Jiang K, Li M (2005) Multi-instance learning based web mining. Appl Intell 22(2):135–147
Zhou ZH, Zhang ML (2007) Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowl Inf Syst 11(2):155–170
Acknowledgments
The authors gratefully acknowledge the financial subsidy provided by the Spanish Department of Research under TIN2008-06681-C06-03 and P08-TIC-3720 Projects and FEDER fund.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zafra, A., Romero, C. & Ventura, S. DRAL: a tool for discovering relevant e-activities for learners. Knowl Inf Syst 36, 211–250 (2013). https://doi.org/10.1007/s10115-012-0531-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0531-8