Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3231644.3231656acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesl-at-sConference Proceedingsconference-collections
research-article

Replicating MOOC predictive models at scale

Published: 26 June 2018 Publication History

Abstract

We present a case study in predictive model replication for student dropout in Massive Open Online Courses (MOOCs) using a large and diverse dataset (133 sessions of 28 unique courses offered by two institutions). This experiment was run on the MOOC Replication Framework (MORF), which makes it feasible to fully replicate complex machine learned models, from raw data to model evaluation. We provide an overview of the MORF platform architecture and functionality, and demonstrate its use through a case study. In this replication of [41], we contextualize and evaluate the results of the previous work using statistical tests and a more effective model evaluation scheme. We find that only some of the original findings replicate across this larger and more diverse sample of MOOCs, with others replicating significantly in the opposite direction. Our analysis also reveals results which are highly relevant to the prediction task which were not reported in the original experiment. This work demonstrates the importance of replication of predictive modeling research in MOOCs using large and diverse datasets, illuminates the challenges of doing so, and describes our freely available, open-source software framework to overcome barriers to replication.

References

[1]
J. M. L. Andres, R. S. Baker, G. Siemens, D. Gašević, and S. Crossley. Studying MOOC completion at scale using the MOOC replication framework. In Proceedings of the International Conference on Learning Analytics and Knowledge, pages 71--78, Mar. 2018.
[2]
J. M. L. Andres, R. S. Baker, G. Siemens, D. Gašević, and C. A. Spann. Replicating 21 findings on student success in online learning. Technology, Instruction, Cognition, and Learning. pages 313--333, 2016.
[3]
G. Balakrishnan and D. Coetzee. Predicting student retention in massive open online courses using hidden markov models. Technical report, Univ. Calif. at Berkeley EECS Dept., 2013.
[4]
C. Boettiger. An introduction to docker for reproducible research. Oper. Syst. Rev., 49(1):71--79, Jan. 2015.
[5]
K. Bollen, J. T. Cacioppo, R. M. Kaplan, J. A. Krosnick, J. L. Olds, and H. Dean. Social, behavioral, and economic sciences perspectives on robust and reliable science. Technical report, NSF Subcommittee on Replicability in Science, 2015.
[6]
S. Boyer and K. Veeramachaneni. Transfer learning for predictive models in massive open online courses. In Artificial Intelligence in Education, pages 54--63. Springer, Cham, June 2015.
[7]
M. J. Brandt, H. IJzerman, A. Dijksterhuis, F. J. Farach, J. Geller, R. Giner-Sorolla, J. A. Grange, M. Perugini, J. R. Spies, and A. van 't Veer. The replication recipe: What makes for a convincing replication? J. Exp. Soc. Psych., 50:217--224, 2014.
[8]
C. Brooks, C. Thompson, and S. Teasley. A time series interaction analysis method for building predictive models of learners using log data. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, pages 126--135. ACM, Mar. 2015.
[9]
J. Cito, V. Ferme, and H. C. Gall. Using docker containers to improve reproducibility in software and web engineering research. In Web Engineering, Lecture Notes in Computer Science, pages 609--612. Springer, Cham, June 2016.
[10]
O. S. Collaboration. Estimating the reproducibility of psychological science. Science, 349(6251):aac4716, Aug. 2015.
[11]
C. Collberg, T. Proebsting, G. Moraila, A. Shankaran, Z. Shi, and A. M. Warren. Measuring reproducibility in computer systems research. Technical report, Univ. Arizona Dept. of Comp. Sci., 2014.
[12]
S. Crossley, L. Paquette, M. Dascalu, D. S. McNamara, and R. S. Baker. Combining click-stream data with NLP tools to better understand MOOC completion. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pages 6--14, 2016.
[13]
J. P. Daries, J. Reich, J. Waldo, E. M. Young, J. Whittinghill, A. D. Ho, D. T. Seaton, and I. Chuang. Privacy, anonymity, and big data in the social sciences. Commun. ACM, 57(9):56--63, 2014.
[14]
F. Dernoncourt, C. Taylor, K. Veeramachaneni, and U. O. Reilly. Moocdb: Developing standards and systems for mooc data science. Technical report, Technical Report, MIT, 2013.
[15]
T. G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, pages 1--15. Springer, Berlin, Heidelberg, June 2000.
[16]
D. Donoho. 50 years of data science. In Princeton NJ, Tukey Centennial Workshop, pages 1--41, 2015.
[17]
B. J. Evans, R. B. Baker, and T. S. Dee. Persistence patterns in massive open online courses (MOOCs). J. Higher Educ., 87(2):206--242, Mar. 2016.
[18]
M. Fei and D. Y. Yeung. Temporal models for predicting student dropout in massive open online courses. In Intl. Conf. on Data Mining Workshop (ICDMW), pages 256--263, 2015.
[19]
J. Fogarty, R. S. Baker, and S. E. Hudson. Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction. In Proceedings of Graphics Interface 2005, pages 129--136, 2005.
[20]
J. A. Gámez, J. L. Mateo, and J. M. Puerta. Learning bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min. Knowl. Discov., 22(1-2):106--148, Jan. 2011.
[21]
J. Gardner and C. Brooks. Dropout model evaluation in MOOCs. In Proceedings of the Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18).
[22]
J. Gardner and C. Brooks. Evaluating predictive models of student success Closing the methodological gap. The Journal of Learning Analytics, 2018. In press.
[23]
J. Gardner and C. Brooks. Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 2018.
[24]
J. Gardner, C. Brooks, J. M. L. Andres, and R. Baker. MORF A framework for MOOC predictive modeling and replication at scale. 2018.
[25]
A. Gelman and E. Loken. The garden of forking paths Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 2013.
[26]
J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29--36, Apr. 1982.
[27]
M. A. HarvardX. HarvardX-MITx Person-Course academic year 2013 De-Identified dataset, version 2.0, May 2014. Title of the publication associated with this dataset: HarvardX-MITx Person-Course Academic Year 2013 De-Identified dataset, version 2.0.
[28]
R. F. Kizilcec and C. Brooks. Diverse big data and randomized field experiments in MOOCs. In C. Lang, G. Siemens, A. Wise, and D. Gašević, editors, Handbook of Learning Analytics, pages 211--222. Society for Learning Analytics Research, 2017.
[29]
R. F. Kizilcec and S. Halawa. Attrition and achievement gaps in online learning. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale, pages 57--66, 2015.
[30]
M. C. Makel and J. A. Plucker. Facts are more important than novelty: Replication in the education sciences. Educ. Res., 43(6):304--316, 2014.
[31]
D. Merkel. Docker: Lightweight linux containers for consistent development and deployment. Linux J., 2014(239), Mar. 2014.
[32]
B. A. Nosek, J. R. Spies, and M. Motyl. Scientific utopia: II. restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci., 7(6):615--631, Nov. 2012.
[33]
T. Sinha, N. Li, P. Jermann, and P. Dillenbourg. Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners. Sept. 2014.
[34]
V. Stodden and S. Miguez. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Journal of Open Research Software, 2(1):1--6, 2013.
[35]
S. A. Stouffer. Adjustment during army life. Princeton University Press, 1949.
[36]
V. Tinto. Research and practice of student retention: What next? J. Coll. Stud. Ret., 8(1):1--19, 2006.
[37]
T. J. Tobin and G. M. Sugai. Using Sixth-Grade school records to predict school violence, chronic discipline problems, and high school outcomes. J. Emot. Behav. Disord., 7(1):40--53, Jan. 1999.
[38]
K. Veeramachaneni, U.-M. O'Reilly, and C. Taylor. Towards feature engineering at scale for data from massive open online courses. July 2014.
[39]
J. Whitehill, K. Mohan, D. Seaton, Y. Rosen, and D. Tingley. Delving deeper into MOOC student dropout prediction. Feb. 2017.
[40]
D. H. Wolpert. Stacked generalization. Neural Netw., 5(2):241--259, 1992.
[41]
W. Xing, X. Chen, J. Stein, and M. Marcinkowski. Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Comput. Human Behav., 58:119--129, 2016.
[42]
D. Yang, T. Sinha, D. Adamson, and C. P. Rosé. Turn on, tune in, drop out: Anticipating student dropouts in massive open online courses. In Proceedings of the 2013 NIPS Data-driven education workshop, volume 11, page 14, 2013.

Cited By

View all
  • (2023)Supporting Adolescent Engagement with Artificial Intelligence–Driven Digital Health Behavior Change InterventionsJournal of Medical Internet Research10.2196/4030625(e40306)Online publication date: 24-May-2023
  • (2023)Towards more replicable content analysis for learning analyticsLAK23: 13th International Learning Analytics and Knowledge Conference10.1145/3576050.3576096(303-314)Online publication date: 13-Mar-2023
  • (2023)Names, Nicknames, and Spelling Errors: Protecting Participant Identity in Learning Analytics of Online DiscussionsLAK23: 13th International Learning Analytics and Knowledge Conference10.1145/3576050.3576070(145-155)Online publication date: 13-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
L@S '18: Proceedings of the Fifth Annual ACM Conference on Learning at Scale
June 2018
391 pages
ISBN:9781450358866
DOI:10.1145/3231644
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

L@S '18
L@S '18: Fifth (2018) ACM Conference on Learning @ Scale
June 26 - 28, 2018
London, United Kingdom

Acceptance Rates

L@S '18 Paper Acceptance Rate 24 of 58 submissions, 41%;
Overall Acceptance Rate 117 of 440 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Supporting Adolescent Engagement with Artificial Intelligence–Driven Digital Health Behavior Change InterventionsJournal of Medical Internet Research10.2196/4030625(e40306)Online publication date: 24-May-2023
  • (2023)Towards more replicable content analysis for learning analyticsLAK23: 13th International Learning Analytics and Knowledge Conference10.1145/3576050.3576096(303-314)Online publication date: 13-Mar-2023
  • (2023)Names, Nicknames, and Spelling Errors: Protecting Participant Identity in Learning Analytics of Online DiscussionsLAK23: 13th International Learning Analytics and Knowledge Conference10.1145/3576050.3576070(145-155)Online publication date: 13-Mar-2023
  • (2023)Exploring Cross-Country Prediction Model Generalizability in MOOCsProceedings of the Tenth ACM Conference on Learning @ Scale10.1145/3573051.3593380(183-194)Online publication date: 20-Jul-2023
  • (2022)A revised application of cognitive presence automatic classifiers for MOOCs: a new set of indicators revealed?International Journal of Educational Technology in Higher Education10.1186/s41239-022-00353-719:1Online publication date: 13-Sep-2022
  • (2022)An Examination of Unofficial Course Reviews in a Graduate Program at ScaleProceedings of the Ninth ACM Conference on Learning @ Scale10.1145/3491140.3528330(289-293)Online publication date: 1-Jun-2022
  • (2022) Controlled outputs, full data: A privacy‐protecting infrastructure for MOOC data British Journal of Educational Technology10.1111/bjet.1323153:4(756-775)Online publication date: 11-May-2022
  • (2022)Adaptation of a Process Mining Methodology to Analyse Learning Strategies in a Synchronous Massive Open Online CourseInformation and Communication Technologies10.1007/978-3-031-18272-3_9(117-136)Online publication date: 5-Oct-2022
  • (2021)Privacy-Driven Learning AnalyticsManage Your Own Learning Analytics10.1007/978-3-030-86316-6_1(1-22)Online publication date: 5-Dec-2021
  • (2020)Analysis of the Factors Influencing Learners’ Performance Prediction With Learning AnalyticsIEEE Access10.1109/ACCESS.2019.29635038(5264-5282)Online publication date: 2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media