short-paper

Open access

Data vs. Model Machine Learning Fairness Testing: An Empirical Study

Authors:

Arumoy Shome,

Luís Cruz,

Arie Van DeursenAuthors Info & Claims

ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

Pages 366 - 367

https://doi.org/10.1145/3639478.3643121

Published: 23 May 2024 Publication History

PDF eReader

Abstract

Although several fairness definitions and bias mitigation techniques exist in the literature, all existing solutions evaluate fairness of Machine Learning (ML) systems after the training stage. In this paper, we take the first steps towards evaluating a more holistic approach by testing for fairness both before and after model training. We evaluate the effectiveness of the proposed approach and position it within the ML development lifecycle, using an empirical analysis of the relationship between model dependent and independent fairness metrics. The study uses 2 fairness metrics, 4 ML algorithms, 5 real-world datasets and 1600 fairness evaluation cycles. We find a linear relationship between data and model fairness metrics when the distribution and the size of the training data changes. Our results indicate that testing for fairness prior to training can be a "cheap" and effective means of catching a biased data collection process early; detecting data drifts in production systems and minimising execution of full training cycles thus reducing development time and costs.

References

[1]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.

Digital Library

Google Scholar

[2]

Felix Biessmann, Jacek Golebiowski, Tammo Rukat, Dustin Lange, and Philipp Schmidt. 2021. Automated Data Validation in Machine Learning Systems. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.[Google Scholar] (2021).

Google Scholar

[3]

Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning. In MLSys.

Google Scholar

[4]

Jie M Zhang and Mark Harman. 2021. "Ignorance and Prejudice" in Software Fairness. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1436--1447.

Digital Library

Google Scholar

Recommendations

Data vs. Model Machine Learning Fairness Testing: An Empirical Study
DeepTest '24: Proceedings of the 5th IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning

Although several fairness definitions and bias mitigation techniques exist in the literature, all existing solutions evaluate fairness of Machine Learning (ML) systems after the training stage. In this paper, we take the first steps towards evaluating a ...
An Extensive Empirical Study on Semi-supervised Learning
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data Mining

Semi-supervised classification methods utilize unlabeled data to help learn better classifiers, when only a small amount of labeled data is available. Many semi-supervised learning methods have been proposed in the past decade. However, some questions ...
An Empirical Study of Rich Subgroup Fairness for Machine Learning
FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

Kearns, Neel, Roth, and Wu [ICML 2018] recently proposed a notion of rich subgroup fairness intended to bridge the gap between statistical and individual notions of fairness. Rich subgroup fairness picks a statistical fairness constraint (say, ...

Comments

Information & Contributors

Information

Published In

ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

April 2024

531 pages

ISBN:9798400705021

DOI:10.1145/3639478

Co-chairs:
Ana Paiva,
Rui Abreu,
Program Co-chairs:
Abhik Roychoudhury,
Margaret Storey

This work is licensed under a Creative Commons Attribution International 4.0 License.

In-Cooperation

Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2024

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICSE-Companion '24

Sponsor:

SIGSOFT

ICSE-Companion '24: 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

April 14 - 20, 2024

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
163
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)23

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Recommendations