research-article

Ensuring Ethical, Transparent, and Auditable Use of Education Data and Algorithms on AutoML

Authors:

Stephen Edwards,

Yang SongAuthors Info & Claims

MLNLP '23: Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing

Pages 66 - 72

https://doi.org/10.1145/3639479.3639492

Published: 28 February 2024 Publication History

Abstract

Automated machine learning (AutoML) creates additional opportunities for less advanced users to build and test their own data mining models. Even though AutoML creates the models for the user, there is still technical knowledge and tools needed to evaluate those models, and due to the black-box nature of the machine learning models, problems can arise with regard to algorithmic biases and fairness. Such biases can escalate in future applications, necessitating a structured approach for fairness evaluation in AutoML. This involves defining fairness criteria, selecting appropriate metrics, assessing fairness across groups, and addressing biases. In the realm of educational data mining, where AutoML is prevalent, biases related to attributes like gender or race can lead to unethical outcomes. Since fairness metrics vary in definition and strength, and some may even contradict others, making fairness evaluation more complex. In this paper, ten fairness metrics were chosen, explored, and implemented on four AutoML tools, Vertex AI, AutoSklearn, AutoKeras, and PyCaret. We identified two open educational datasets and built both prediction and classification models on those AutoML frameworks. We report our work in evaluating different machine learning models created by AutoML and provide discussions about the challenges in evaluating fairness in those models and our effort to mitigate and resolve the problems of algorithmic bias in educational data mining.

References

[1]

J. Waring, C. Lindvall, and R. Umeton, “Automated machine learning: Review of the state-of-the-art and opportunities for healthcare,” Artif. Intell. Med., vol. 104, p. 101822, 2020.

[2]

F. Hutter, L. Kotthoff, and J. Vanschoren, Automated machine learning: methods, systems, challenges. Springer Nature, 2019.

[3]

G. Novillo Rangone, C. Pizarro, and G. Montejano, “Automation of an Educational Data Mining Model Applying Interpretable Machine Learning and Auto Machine Learning,” in Communication and Smart Technologies: Proceedings of ICOMTA 2021, Springer, 2022, pp. 22–30.

[4]

M. Tsiakmaki, G. Kostopoulos, S. Kotsiantis, and O. Ragos, “Fuzzy-based active learning for predicting student academic performance using autoML: a step-wise approach,” J. Comput. High. Educ., vol. 33, no. 3, pp. 635–667, 2021.

[5]

A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in CVPR 2011, IEEE, 2011, pp. 1521–1528.

Digital Library

[6]

D. Danks and A. J. London, “Algorithmic Bias in Autonomous Systems.,” in Ijcai, 2017, pp. 4691–4697.

[7]

M. D. Rozier, K. K. Patel, and D. A. Cross, “Electronic health records as biased tools or tools against bias: a conceptual model,” Milbank Q., vol. 100, no. 1, pp. 134–150, 2022.

[8]

M. Tsiakmaki, G. Kostopoulos, S. Kotsiantis, and O. Ragos, “Implementing AutoML in educational data mining for prediction tasks,” Appl. Sci., vol. 10, no. 1, p. 90, 2019.

[9]

C. Romero and S. Ventura, “Educational data mining: A survey from 1995 to 2005,” Expert Syst. Appl., vol. 33, no. 1, pp. 135–146, 2007.

Digital Library

[10]

M. Khajah, R. V. Lindsey, and M. C. Mozer, “How deep is knowledge tracing?,” ArXiv Prepr. ArXiv160402416, 2016.

[11]

M. Helali, E. Mansour, I. Abdelaziz, J. Dolby, and K. Srinivas, “A scalable AutoML approach based on graph neural networks,” ArXiv Prepr. ArXiv211100083, 2021.

[12]

N. Bosch, “AutoML feature engineering for student modeling yields high accuracy, but limited interpretability,” J. Educ. Data Min., vol. 13, no. 2, pp. 55–79, 2021.

[13]

S. Garmpis, M. Maragoudakis, and A. Garmpis, “Assisting Educational Analytics with AutoML Functionalities,” Computers, vol. 11, no. 6, p. 97, 2022.

[14]

A. Loukina, N. Madnani, and K. Zechner, “The many dimensions of algorithmic fairness in educational applications,” in Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, 2019, pp. 1–10.

[15]

J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent trade-offs in the fair determination of risk scores,” ArXiv Prepr. ArXiv160905807, 2016.

[16]

R. Berk, H. Heidari, S. Jabbari, M. Kearns, and A. Roth, “Fairness in criminal justice risk assessments: The state of the art,” Sociol. Methods Res., vol. 50, no. 1, pp. 3–44, 2021.

[17]

Q. Hu and H. Rangwala, “Towards Fair Educational Data Mining: A Case Study on Detecting At-Risk Students.,” Int. Educ. Data Min. Soc., 2020.

[18]

Q. Hu and H. Rangwala, “Metric-free individual fairness with cooperative contextual bandits,” in 2020 IEEE International Conference on Data Mining (ICDM), IEEE, 2020, pp. 182–191.

[19]

S. Caton and C. Haas, “Fairness in Machine Learning: A Survey,” ACM Comput. Surv., p. 3616865, Aug. 2023.

Digital Library

[20]

fairlearn, “Common fairness metrics.” [Online]. Available: https://fairlearn.org/main/user_guide/assessment/common_fairness_metrics.html

[21]

P. Cortez and A. M. G. Silva, “Using data mining to predict secondary school student performance,” 2008.

[22]

J. Kuzilek, M. Hlosta, and Z. Zdrahal, “Open university learning analytics dataset,” Sci. Data, vol. 4, no. 1, pp. 1–8, 2017.

[23]

H. Jordan, P. Roderick, and D. Martin, “The Index of Multiple Deprivation 2000 and accessibility effects on health,” J. Epidemiol. Community Health, vol. 58, no. 3, pp. 250–257, 2004.

[24]

M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, “Auto-sklearn 2.0: Hands-free automl via meta-learning,” J. Mach. Learn. Res., vol. 23, no. 1, pp. 11936–11996, 2022.

Digital Library

[25]

J. Katti, J. Agarwal, S. Bharata, S. Shinde, S. Mane, and V. Biradar, “University Admission Prediction Using Google Vertex AI,” in 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR), IEEE, 2022, pp. 1–5.

[26]

H. Jin, Q. Song, and X. Hu, “Auto-keras: An efficient neural architecture search system,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 1946–1956.

Digital Library

[27]

U. R. Pol and T. U. Sawant, “Automl: Building An Classfication Model With Pycaret”.

[28]

“11.1. pickle — Python object serialization — Python 2.7.18 documentation.“ Accessed: May 22, 2023. [Online]. Available: https://docs.python.org/2/library/pickle.html

[29]

S. Barocas and A. D. Selbst, “Big data's disparate impact,” Calif. Law Rev., pp. 671–732, 2016.

[30]

T. Le Quy, A. Roy, V. Iosifidis, W. Zhang, and E. Ntoutsi, “A survey on datasets for fairness-aware machine learning,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 12, no. 3, p. e1452, 2022.

[31]

W. R. Scull, M. A. Perkins, J. W. Carrier, and M. Barber, “Community college institutional researchers’ knowledge, experience, and perceptions of machine learning,” Community Coll. J. Res. Pract., vol. 47, no. 5, pp. 354–368, 2023.

[32]

J. Bahan Pal, “ColabTricks.” Accessed: May 22, 2023. [Online]. Available: https://jimut123.github.io/blogs/ML/ColabTricks.html

[33]

G. Fenu, R. Galici, and M. Marras, “Experts’ View on Challenges and Needs for Fairness in Artificial Intelligence for Education,” in Artificial Intelligence in Education: 23rd International Conference, AIED 2022, Durham, UK, July 27–31, 2022, Proceedings, Part I, Springer, 2022, pp. 243–255.

Index Terms

Ensuring Ethical, Transparent, and Auditable Use of Education Data and Algorithms on AutoML

Index terms have been assigned to the content through auto-classification.

Recommendations

Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems
IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces

We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing ...
A General Recipe for Automated Machine Learning in Practice
Advances in Artificial Intelligence – IBERAMIA 2022
Abstract
Automated Machine Learning (AutoML) is an area of research that focuses on developing methods to generate machine learning models automatically. The idea of being able to build machine learning models with very little human intervention represents ...
How far are we with automated machine learning? characterization and challenges of AutoML toolkits
Abstract
Automated Machine Learning aka AutoML toolkits are low/no-code software that aim to democratize ML system application development by ensuring rapid prototyping of ML models and by enabling collaboration across different stakeholders in ML system ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MLNLP '23: Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing

December 2023

252 pages

ISBN:9798400709241

DOI:10.1145/3639479

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MLNLP 2023

MLNLP 2023: 2023 6th International Conference on Machine Learning and Natural Language Processing

December 27 - 29, 2023

Sanya, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
37
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents