research-article

Effective Feature Learning with Unsupervised Learning for Improving the Predictive Models in Massive Open Online Courses

Authors:

Ting-Chuen PongAuthors Info & Claims

LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge

Pages 135 - 144

https://doi.org/10.1145/3303772.3303795

Published: 04 March 2019 Publication History

Abstract

The effectiveness of learning in massive open online courses (MOOCs) can be significantly enhanced by introducing personalized intervention schemes which rely on building predictive models of student learning behaviors such as some engagement or performance indicators. A major challenge that has to be addressed when building such models is to design handcrafted features that are effective for the prediction task at hand. In this paper, we make the first attempt to solve the feature learning problem by taking the unsupervised learning approach to learn a compact representation of the raw features with a large degree of redundancy. Specifically, in order to capture the underlying learning patterns in the content domain and the temporal nature of the clickstream data, we train a modified auto-encoder (AE) combined with the long short-term memory (LSTM) network to obtain a fixed-length embedding for each input sequence. When compared with the original features, the new features that correspond to the embedding obtained by the modified LSTM-AE are not only more parsimonious but also more discriminative for our prediction task. Using simple supervised learning models, the learned features can improve the prediction accuracy by up to 17% compared with the supervised neural networks and reduce overfitting to the dominant low-performing group of students, specifically in the task of predicting students' performance. Our approach is generic in the sense that it is not restricted to a specific supervised learning model nor a specific prediction task for MOOC learning analytics.

References

[1]

Yoshua Bengio. 2012. Practical Recommendations for Gradient-Based Training of Deep Architectures. Springer Berlin Heidelberg, Berlin, Heidelberg, 437--478.

[2]

Nigel Bosch. 2017. Unsupervised Deep Autoencoders for Feature Extraction with Educational Data. In Proceedings of the EDM 2017 Workshops and Tutorials co-located with the 10th International Conference on Educational Data Mining. EDM, Urbana, IL, USA.

[3]

Sebastien Boyer and Kalyan Veeramachaneni. 2015. Transfer Learning for Predictive Models in Massive Open Online Courses. In Artificial Intelligence in Education. Springer International Publishing, Massachusetts Institute of Technology, 54--63.

[4]

Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim. 2015. Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks. In AIED Workshops. AIED, Seoul, South Korea.

[5]

T. Daradoumis, R. Bassi, F. Xhafa, and S. Caballé. 2013. A Review on Massive E-Learning (MOOC) Design, Delivery and Assessment. In 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC, Mytilini, Greece, 208--213.

Digital Library

[6]

M. Fei and D. Y. Yeung. 2015. Temporal Models for Predicting Student Dropout in Massive Open Online Courses. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW). ICDMW, Hong Kong, China, 256--263.

Digital Library

[7]

Sherif Halawa, Daniel Greene, and John Mitchell. 2014. Dropout prediction in MOOCs using learner activity features. Proceedings of the Second European MOOC Stakeholder Summit 37, 1 (2014), 58--65.

[8]

Jiazhen He, James Bailey, Benjamin IP Rubinstein, and Rui Zhang. 2015. Identifying At-Risk Students in Massive Open Online Courses. In AAAI. AAAI, Melbourne, Australia, 1749--1755.

Digital Library

[9]

Geoffrey E Hinton and Sam T Roweis. 2003. Stochastic neighbor embedding. In Advances in neural information processing systems. NIPS, Toronto, Canada, 857--864.

Digital Library

[10]

I. T. Jolliffe. 1986. Principal Component Analysis and Factor Analysis. Springer, New York, NY, 115--128.

[11]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980

[12]

Severin Klingler, Rafael Wampfler, Tanja Käser, Barbara Solenthaler, and Markus Gross. 2017. Efficient Feature Embeddings for Student Classification with Variational Autoencoders. In Proceedings of the 10th International Conference on Educational Data Mining. EDM, ETH Zurich, Switzerland, 72--79.

[13]

Marius Kloft, Felix Stiehler, Zhilin Zheng, and Niels Pinkwart. 2014. Predicting MOOC dropout over weeks using machine learning methods. In Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs. EMNLP, Berlin, Germany, 60--65.

[14]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.

[15]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[16]

Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised Learning of Video Representations using LSTMs. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 843--852.

Digital Library

[17]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., Mountain View, CA, USA, 3104--3112.

Digital Library

[18]

Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26--31.

[19]

Jacob Whitehill, Kiran Mohan, Daniel Seaton, Yigal Rosen, and Dustin Tingley. 2017. MOOC Dropout Prediction: How to Measure Accuracy?. In Proceedings of the Fourth (2017) ACM Conference on Learning@Scale. ACM, L@S, Worcester, MA, USA, 161--164.

Digital Library

[20]

Jacob Whitehill, Joseph Jay Williams, Glenn Lopez, Cody Austun Coleman, and Justin Reich. 2015. Beyond prediction: First steps toward automatic intervention in MOOC student stopout. In Proceedings of the 8th International Conference on Educational Data Mining. EDM, Worcester, MA, USA.

[21]

Cheng Ye and Gautam Biswas. 2014. Early prediction of student dropout and performance in MOOCs using higher granularity temporal information. Journal of Learning Analytics 1, 3 (2014), 169--172.

Cited By

Yürüm OTaşkaya-Temizel TYıldırım S(2023)Predictive Video Analytics in Online Courses: A Systematic Literature ReviewTechnology, Knowledge and Learning10.1007/s10758-023-09697-zOnline publication date: 4-Nov-2023
https://doi.org/10.1007/s10758-023-09697-z
Alam NMostafavi BChi MBarnes T(2023)Exploring the Effect of Autoencoder Based Feature Learning for a Deep Reinforcement Learning Policy for Providing Proactive HelpArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky10.1007/978-3-031-36336-8_43(278-283)Online publication date: 30-Jun-2023
https://doi.org/10.1007/978-3-031-36336-8_43
Minematsu TTaniguchi YShimada A(2023)Contrastive Learning for Reading Behavior Embedding in E-book SystemArtificial Intelligence in Education10.1007/978-3-031-36272-9_35(426-437)Online publication date: 26-Jun-2023
https://doi.org/10.1007/978-3-031-36272-9_35
Show More Cited By

Index Terms

Effective Feature Learning with Unsupervised Learning for Improving the Predictive Models in Massive Open Online Courses
1. Applied computing
  1. Education
    1. E-learning
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
    2. Machine learning approaches
      1. Learning latent representations
      2. Neural networks

Recommendations

Transfer Learning using Representation Learning in Massive Open Online Courses
LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge

In a Massive Open Online Course (MOOC), predictive models of student behavior can support multiple aspects of learning, including instructor feedback and timely intervention. Ongoing courses, when the student outcomes are yet unknown, must rely on ...
Supporting learners' self-regulated learning in Massive Open Online Courses
Abstract
In MOOCs, learners are typically presented with great autonomy over their learning process. Therefore, learners should engage in self-regulated learning (SRL) in order to successfully study in a MOOC. Learners however often struggle to self-...
Highlights
- Learners struggle to regulate their learning in massive open online courses (MOOCs).
- A self-regulated learning (SRL) intervention was implemented in three MOOCs.
- Learners' SRL was measured with trace data variables.
- ...
Benefit and Cost Analysis of Massive Open Online Courses: Pedagogical Implications on Higher Education

There has been much research done on online learning including research on online educational activities and methods. The use of technology is gaining rising importance in higher education due to the benefits that it brings. In terms of adopting new ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge

March 2019

565 pages

ISBN:9781450362566

DOI:10.1145/3303772

General Chairs:
Sharon Hsiao
Arizona State University, USA
,
Jim Cunningham
Arizona State University, USA
,
Katie McCarthy
Georgia State University, USA
,
Grace Lynch
Society for Learning Analytics Research, Australia
,
Program Chairs:
Christopher Brooks
University of Michigan, USA
,
Rebecca Ferguson
The Open University, UK
,
Ulrich Hoppe
University of Duisburg-Essen, Germany

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SoLAR: The Society for Learning Analytics Research
SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

LAK19

LAK19: The 9th International Learning Analytics & Knowledge Conference

March 4 - 8, 2019

AZ, Tempe, USA

Acceptance Rates

Overall Acceptance Rate 236 of 782 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
446
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yürüm OTaşkaya-Temizel TYıldırım S(2023)Predictive Video Analytics in Online Courses: A Systematic Literature ReviewTechnology, Knowledge and Learning10.1007/s10758-023-09697-zOnline publication date: 4-Nov-2023
https://doi.org/10.1007/s10758-023-09697-z
Alam NMostafavi BChi MBarnes T(2023)Exploring the Effect of Autoencoder Based Feature Learning for a Deep Reinforcement Learning Policy for Providing Proactive HelpArtificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky10.1007/978-3-031-36336-8_43(278-283)Online publication date: 30-Jun-2023
https://doi.org/10.1007/978-3-031-36336-8_43
Minematsu TTaniguchi YShimada A(2023)Contrastive Learning for Reading Behavior Embedding in E-book SystemArtificial Intelligence in Education10.1007/978-3-031-36272-9_35(426-437)Online publication date: 26-Jun-2023
https://doi.org/10.1007/978-3-031-36272-9_35
Saravanan TNagadeepa NMukunthan B(2022)The Effective Learning Approach to ICT-TPACK and Prediction of the Academic Performance of Students Based on Machine Learning TechniquesCommunication and Intelligent Systems10.1007/978-981-19-2130-8_7(79-93)Online publication date: 19-Aug-2022
https://doi.org/10.1007/978-981-19-2130-8_7
Queiroga EEnríquez CCechinel CCasas AParagarino VBencke LRamos V(2021)Using Virtual Learning Environment Data for the Development of Institutional Educational PoliciesApplied Sciences10.3390/app1115681111:15(6811)Online publication date: 24-Jul-2021
https://doi.org/10.3390/app11156811
Lim HKim SChung KLee KKim THeo J(2021)Is college students’ trajectory associated with academic performance?Computers & Education10.1016/j.compedu.2021.104397178:COnline publication date: 29-Dec-2021
https://dl.acm.org/doi/10.1016/j.compedu.2021.104397
Prenkaj BVelardi PStilo GDistante DFaralli S(2020)A Survey of Machine Learning Approaches for Student Dropout Prediction in Online CoursesACM Computing Surveys10.1145/338879253:3(1-34)Online publication date: 28-May-2020
https://doi.org/10.1145/3388792
Wei HLi HXia MWang YQu HRensing CDrachsler HKovanović VPinkwart NScheffel MVerbert K(2020)Predicting student performance in interactive online question pools using mouse interaction featuresProceedings of the Tenth International Conference on Learning Analytics & Knowledge10.1145/3375462.3375521(645-654)Online publication date: 23-Mar-2020
https://dl.acm.org/doi/10.1145/3375462.3375521
Tang XWang ZLiu JYing Z(2020)An exploratory analysis of the latent structure of process data via action sequence autoencodersBritish Journal of Mathematical and Statistical Psychology10.1111/bmsp.1220374:1(1-33)Online publication date: 22-May-2020
https://doi.org/10.1111/bmsp.12203
Moreno-Marcos PPong TMunoz-Merino PDelgado Kloos C(2020)Analysis of the Factors Influencing Learners’ Performance Prediction With Learning AnalyticsIEEE Access10.1109/ACCESS.2019.29635038(5264-5282)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2019.2963503
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents