Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3303772.3303800acmotherconferencesArticle/Chapter ViewAbstractPublication PageslakConference Proceedingsconference-collections
research-article

An Analysis of Student Representation, Representative Features and Classification Algorithms to Predict Degree Dropout

Published: 04 March 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Identifying and monitoring students who are likely to dropout is a vital issue for universities. Early detection allows institutions to intervene, addressing problems and retaining students. Prior research into the early detection of at-risk students has opted for the use of predictive models, but a comprehensive assessment of the suitability of different algorithms and approaches is complicated by the large number of variable features that constitute a student's educational experience. Predictive models vary in terms of their amplitude, temporality and the learning algorithms employed. While amplitude refers to the ability of the model to operate on multiple degrees, temporality is often considered due to the natural temporal aspect of the data. In the absence of a comparative framework of learning algorithms, the aim of this paper has been to provide such an analysis, based on a proposed classification of strategies for predicting dropouts in Higher Education Institutions. Three different student representations are implemented (namely Global Feature-Based, Local Feature-Based, and Time Series) in conjunction with the appropriate learning algorithms for each of them. A description of each approach, as well as its implementation process, are presented in this paper as technical contributions. An experiment based on a dataset of student information from two degrees, namely Business Administration and Architecture, acquired through an automated management system from a university in Brazil is used. Our findings can be summarized as: (i) of the three proposed student representations, the Local Feature-Based was the most suitable approach for predicting dropout. In addition to providing high quality results, the Local Feature-Based representations are simple to build, and the construction of the model is less expensive when compared to more complex ones; (ii) as a conclusion of the results obtained via Local Feature-Based, dropout can be said to be accurately predicted using grades of a few core courses, so there is no need for a complex features extraction process; (iii) considering temporal aspects of the data does not seem to contribute to the prediction performance although it increases computational costs as the model complexity increases.

    References

    [1]
    Stefan Conrad Alexander Askinadze. 2017. Application of the Dynamic Time Warping Distance for the Student Drop-out Prediction on Time Series Data. Proceedings of the 10th International Conference on Educational Data Mining (2017).
    [2]
    Lovenoor Aulck, Nishant Velagapudi, Joshua Blumenstock, and Jevin West. 2016. Predicting Student Dropout in Higher Education. (2016). arXiv:arXiv:1606.06364
    [3]
    George Edward Pelham Box and Gwilym Jenkins. 1990. Time Series Analysis, Forecasting and Control. Holden-Day, Inc., San Francisco, CA, USA.
    [4]
    Pimwadee Chaovalit, Aryya Gangopadhyay, George Karabatis, and Zhiyuan Chen. 2011. Discrete Wavelet Transform-based Time Series Analysis and Mining. ACM Comput. Surv. 43, 2, Article 6 (Feb. 2011), 37 pages.
    [5]
    Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Int. Res. 16, 1 (June 2002), 321--357. http://dl.acm.org/citation.cfm?id=1622407.1622416
    [6]
    Marco Cuturi. 2011. Fast Global Alignment Kernels. In ICML, Lise Getoor and Tobias Scheffer (Eds.). Omnipress, 929--936.
    [7]
    Gerben Dekker, Mykola Pechenizkiy, and Jan Vleeshouwers. 2009. Predicting Students Drop Out: A Case Study. In International Conference on Educational Data Mining.
    [8]
    Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1542--1552.
    [9]
    The Organisation for Economic Co-operation and Development. 2013. Education at a Glance: OECD indicators http://www.oecd.org/education/eag2013.htm. Technical Report. The Organisation for Economic Co-operation and Development.
    [10]
    Joaquin Gairin, Xavier M. Triado, MÚnica Feixas, Pilar Figuera, Pilar Aparicio-Chueca, and Mercedes Torrado. 2014. Student dropout rates in Catalan universities: profile and motives for disengagement. Quality in Higher Education 20, 2 (2014), 165--182.
    [11]
    Tomasz Gorecki and Maciej Luczak. 2015. Multivariate time series classification with parametric derivative dynamic time warping. Expert Systems with Applications 42, 5 (2015), 2305--2312.
    [12]
    V. Gottin, H. JimÃl'nez, A. C. Finamore, M. A. Casanova, A. L. Furtado, and B. P. Nunes. 2017. An Analysis of Degree Curricula through Mining Student Records. In 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT). 276--280.
    [13]
    L. Haiyang, Z. Wang, P. Benachour, and P. Tubman. 2018. A Time Series Classification Method for Behaviour-Based Dropout Prediction. In 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT). 191--195.
    [14]
    Satoshi Hara and Kohei Hayashi. 2018. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Amos Storkey and Fernando Perez-Cruz (Eds.), Vol. 84. PMLR, Playa Blanca, Lanzarote, Canary Islands, 77--85.
    [15]
    Martin Hlosta, Zdenek Zdrahal, and Jaroslav Zendulka. 2017. Ouroboros: Early Identification of At-risk Students Without Models Based on Legacy Data. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference (LAK '17). ACM, New York, NY, USA, 6--15.
    [16]
    Mukesh Kumar. 2017. Literature Survey on Educational Dropout Prediction. International Journal of Education and Management Engineering 7, 2 (2017).
    [17]
    Laci Mary B. Manhães and Geraldo Zimbrão. 2014. Evaluating Performance and Dropouts of Undergraduates Using Educational Data Mining. In Twenty-Ninth Symposium on Applied Computing.
    [18]
    Theophano Mitsa. 2010. Temporal Data Mining (1st ed.). Chapman & Hall/CRC.
    [19]
    Ewa Mlynarska, Derek Greene, and Padraig Cunningham. 2016. Time Series Clustering of Moodle Activity Data. In AICS.
    [20]
    Fabian Mörchen. 2003. Time series feature extraction for data mining using DWT and DFT.
    [21]
    C. Orsenigo and C. Vercellis. 2010. Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification. Pattern Recognition 43, 11 (2010), 3787--3794.
    [22]
    Sergi Rovira, Eloi Puertas, and Laura Igual. 2017. Data-driven system to predict academic grades and dropout. PLOS ONE 12, 2 (02 2017), 1--21.
    [23]
    Allan Sales, Leandro Balby, and Adalberto Cajueiro. 2016. Exploiting Academic Records for Predicting Student Drop Out: a case study in Brazilian higher education. JIDM 7, 2 (2016), 166--180.
    [24]
    Skyler Seto, Wenyu Zhang, and Yichen Zhou. 2015. Multivariate Time Series Classification Using Dynamic Time Warping Template Selection for Human Activity Recognition. 2015 IEEE Symposium Series on Computational Intelligence (2015), 1399--1406.
    [25]
    Mohammad Shokoohi-Yekta, Bing Hu, Hongxia Jin, Jun Wang, and Eamonn Keogh. 2017. Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Mining and Knowledge Discovery 31, 1 (01 Jan 2017), 1--31.
    [26]
    Gian Antonio Susto, Angelo Cenedese, and Matteo Terzi. 2018. Chapter 9 - Time-Series Classification Methods: Review and Applications to Power Systems Data. In Big Data Application in Power Systems, Reza Arghandeh and Yuxun Zhou (Eds.). Elsevier, 179--220.
    [27]
    Jenna Wiens, John V. Guttag, and Eric Horvitz. 2012. Patient Risk Stratification for Hospital-associated C. Diff As a Time-series Classification Task. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS'12). Curran Associates Inc., USA, 467--475.

    Cited By

    View all
    • (2024)Predictive Model of School Dropout Based on Undergraduate Course Self-assessment DataArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky10.1007/978-3-031-64315-6_15(195-207)Online publication date: 2-Jul-2024
    • (2024)Analysis of Machine Learning Models for Academic Performance PredictionGenerative Intelligence and Intelligent Tutoring Systems10.1007/978-3-031-63031-6_13(150-161)Online publication date: 10-Jun-2024
    • (2023)Analyzing feature importance for a predictive undergraduate student dropout modelComputer Science and Information Systems10.2298/CSIS211110050J20:1(175-194)Online publication date: 2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge
    March 2019
    565 pages
    ISBN:9781450362566
    DOI:10.1145/3303772
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 March 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Degree Dropout Analysis
    2. Dropout Prediction
    3. Features Extraction
    4. Student Representation
    5. Temporal Analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    LAK19

    Acceptance Rates

    Overall Acceptance Rate 236 of 782 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)53
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Predictive Model of School Dropout Based on Undergraduate Course Self-assessment DataArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky10.1007/978-3-031-64315-6_15(195-207)Online publication date: 2-Jul-2024
    • (2024)Analysis of Machine Learning Models for Academic Performance PredictionGenerative Intelligence and Intelligent Tutoring Systems10.1007/978-3-031-63031-6_13(150-161)Online publication date: 10-Jun-2024
    • (2023)Analyzing feature importance for a predictive undergraduate student dropout modelComputer Science and Information Systems10.2298/CSIS211110050J20:1(175-194)Online publication date: 2023
    • (2023)Dropout Prediction in a Web Environment Based on Universal Design for LearningArtificial Intelligence in Education10.1007/978-3-031-36272-9_42(515-527)Online publication date: 3-Jul-2023
    • (2022)Connecting the dots – A literature review on learning analytics indicators from a learning design perspectiveJournal of Computer Assisted Learning10.1111/jcal.12716Online publication date: 26-Jul-2022
    • (2022)Institutional Data Analysis and Machine Learning Prediction of Student Performance2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD54268.2022.9776102(1480-1485)Online publication date: 4-May-2022
    • (2022)A neuro-fuzzy model for predicting and analyzing student graduation performance in computing programsEducation and Information Technologies10.1007/s10639-022-11205-228:3(2455-2484)Online publication date: 18-Aug-2022
    • (2022)Not Another Hardcoded Solution to the Student Dropout Prediction Problem: A Novel Approach Using Genetic Algorithms for Feature SelectionIntelligent Tutoring Systems10.1007/978-3-031-09680-8_23(238-251)Online publication date: 24-Jun-2022
    • (2021)RIP Emojis and Words to Contextualize Mourning on TwitterProceedings of the 32nd ACM Conference on Hypertext and Social Media10.1145/3465336.3475100(257-263)Online publication date: 30-Aug-2021
    • (2021)Mixture of Survival Analysis Models-Cluster-Weighted Weibull DistributionsIEEE Access10.1109/ACCESS.2021.31275769(152288-152299)Online publication date: 2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media