research-article

Missing the missing values: : The ugly duckling of fairness in machine learning

Authors:

Martínez‐Plumed Fernando,

Ferri Cèsar,

Nieves David,

Hernández‐Orallo JoséAuthors Info & Claims

International Journal of Intelligent Systems, Volume 36, Issue 7

Pages 3217 - 3258

https://doi.org/10.1002/int.22415

Published: 28 May 2021 Publication History

Abstract

Nowadays, there is an increasing concern in machine learning about the causes underlying unfair decision making, that is, algorithmic decisions discriminating some groups over others, especially with groups that are defined over protected attributes, such as gender, race and nationality. Missing values are one frequent manifestation of all these latent causes: protected groups are more reluctant to give information that could be used against them, sensitive information for some groups can be erased by human operators, or data acquisition may simply be less complete and systematic for minority groups. However, most recent techniques, libraries and experimental results dealing with fairness in machine learning have simply ignored missing data. In this paper, we present the first comprehensive analysis of the relation between missing values and algorithmic fairness for machine learning: (1) we analyse the sources of missing data and bias, mapping the common causes, (2) we find that rows containing missing values are usually fairer than the rest, which should discourage the consideration of missing values as the uncomfortable ugly data that different techniques and libraries for handling algorithmic bias get rid of at the first occasion, (3) we study the trade‐off between performance and fairness when the rows with missing values are used (either because the technique deals with them directly or by imputation methods), and (4) we show that the sensitivity of six different machine‐learning techniques to missing values is usually low, which reinforces the view that the rows with missing data contribute more to fairness through the other, nonmissing, attributes. We end the paper with a series of recommended procedures about what to do with missing data when aiming for fair decision making.

References

[1]

Angwin J, Larson J, Mattu S, Kirchner L. Machine bias: there's software used across the country to predict future criminals. And it's biased against blacks. ProPublica. 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Abstract

References

Cited By

Recommendations

Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model

Missing values: how many can they be to preserve classification reliability?

A novel clustering-based purity and distance imputation for handling medical data with missing values

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations