research-article

Properties of Fairness Measures in the Context of Varying Class Imbalance and Protected Group Ratios

Authors:

Dariusz Brzezinski,

Julia Stachowiak,

Jerzy Stefanowski,

Izabela Szczech,

Robert Susmaga,

Sofya Aksenyuk,

Uladzimir Ivashka,

Oleksandr YasinskyiAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 7

Article No.: 170, Pages 1 - 18

https://doi.org/10.1145/3654659

Published: 19 June 2024 Publication History

Abstract

Society is increasingly relying on predictive models in fields like criminal justice, credit risk management, and hiring. To prevent such automated systems from discriminating against people belonging to certain groups, fairness measures have become a crucial component in socially relevant applications of machine learning. However, existing fairness measures have been designed to assess the bias between predictions for protected groups without considering the imbalance in the classes of the target variable. Current research on the potential effect of class imbalance on fairness focuses on practical applications rather than dataset-independent measure properties. In this article, we study the general properties of fairness measures for changing class and protected group proportions. For this purpose, we analyze the probability mass functions of six of the most popular group fairness measures. We also measure how the probability of achieving perfect fairness changes for varying class imbalance ratios. Moreover, we relate the dataset-independent properties of fairness measures described in this work to classifier fairness in real-life tasks. Our results show that measures such as Equal Opportunity and Positive Predictive Parity are more sensitive to changes in class imbalance than Accuracy Equality. These findings can help guide researchers and practitioners in choosing the most appropriate fairness measures for their classification problems.

Supplementary Material

TKDD-2023-09-0508-SUPP (tkdd-2023-09-0508-supp.zip)

Supplementary material

Download
3.40 MB

References

[1]

Hadis Anahideh, Nazanin Nezami, and Abolfazl Asudeh. 2021. On the choice of fairness: Finding representative fairness metrics for a given context. arXiv:2109.05697 (2021).

[2]

Hubert Baniecki, Wojciech Kretowicz, Piotr Piatyszek, Jakub Wisniewski, and Przemyslaw Biecek. 2021. Dalex: Responsible machine learning with interactive explainability and fairness in Python. Journal of Machine Learning Research 22, 1 (2021), 9759–9765.

Digital Library

[3]

Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2023. Fairness and Machine Learning: Limitations and Opportunities. MIT Press.

[4]

Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2021. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research 50, 1 (2021), 3–44.

[5]

Paula Branco, Luís Torgo, and Rita P. Ribeiro. 2016. A survey of predictive modeling on imbalanced domains. ACM Computing Surveys 49, 2 (2016), 1–50.

Digital Library

[6]

Dariusz Brzezinski, Jerzy Stefanowski, Robert Susmaga, and Izabela Szczech. 2018. Visual-based analysis of classification measures and their properties for class imbalanced problems. Information Sciences 462 (2018), 242–261.

Digital Library

[7]

Dariusz Brzezinski, Jerzy Stefanowski, Robert Susmaga, and Izabela Szczech. 2019. On the dynamics of classification measures for imbalanced and streaming data. IEEE Transactions on Neural Networks and Learning Systems 31, 8 (2019), 2868–2878.

[8]

Alessandro Castelnovo, Riccardo Crupi, Greta Greco, Daniele Regoli, Ilaria Giuseppina Penco, and Andrea Claudio Cosentini. 2022. A clarification of the nuances in the fairness metrics landscape. Scientific Reports 12, 1 (2022), 4209.

[9]

Simon Caton and Christian Haas. 2023. Fairness in machine learning: A survey. ACM Computing Surveys. Published Online, August 23, 2023. DOI:

Digital Library

[10]

Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 5, 2 (2017), 153–163.

[11]

European Commission. 2021. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). Technical Report. Procedure number 2021/0106/COD. European Commission.

[12]

Sam Corbett-Davies and Sharad Goel. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv:1808.00023 (2018).

[13]

Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, 797–806.

Digital Library

[14]

Damien Dablain, Bartosz Krawczyk, and Nitesh Chawla. 2022. Towards a holistic view of bias in machine learning: Bridging algorithmic fairness and imbalanced learning. arXiv:2207.06084 (2022).

[15]

Zhun Deng, Jiayao Zhang, Linjun Zhang, Ting Ye, Yates Coley, Weijie J. Su, and James Zou. 2022. FIFA: Making fairness more generalizable in classifiers trained on imbalanced data. arXiv:2206.02792 (2022).

[16]

Frances Ding, Moritz Hardt, John Miller, and Ludwig Schmidt. 2021. Retiring adult: New datasets for fair machine learning. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, 6478–6490.

[17]

Iris Dominguez-Catena, Daniel Paternain, and Mikel Galar. 2023. Gender stereotyping impact in facial expression recognition. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Communications in Computer and Information Science, Vol. 1752. Springer, 9–22.

[18]

Michele Donini, Luca Oneto, Shai Ben-David, John S. Shawe-Taylor, and Massimiliano Pontil. 2018. Empirical risk minimization under fairness constraints. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, 1–11.

[19]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved April 3, 2024 from http://archive.ics.uci.edu/ml

[20]

Jannik Dunkelau and Michael Leuschel. 2019. Fairness-Aware Machine Learning: An Extensive Overview. Technical Report. Universität Düsseldorf.

[21]

Alessandro Fabris, Stefano Messina, Gianmaria Silvello, and Gian Antonio Susto. 2022. Algorithmic fairness datasets: The story so far. Data Mining and Knowledge Discovery 36, 6 (2022), 2074–2152.

Digital Library

[22]

William Feller. 1968. An Introduction to Probability Theory and Its Applications. Vol. I. John Wiley & Sons, New York, NY, USA.

[23]

Alberto Fernández, Salvador García, Mikel Galar, Ronaldo C. Prati, Bartosz Krawczyk, and Francisco Herrera. 2018. Learning From Imbalanced Data Sets. Vol. 10. Springer, Cham, Switzerland.

[24]

Elisa Ferrari and Davide Bacciu. 2021. Addressing fairness, bias and class imbalance in machine learning: The FBI-loss. arXiv:2105.06345 (2021).

[25]

Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2019. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 329–338.

Digital Library

[26]

Pratik Gajane and Mykola Pechenizkiy. 2018. On formalizing fairness in prediction with machine learning. arXiv 1710.03184 (2018).

[27]

Qiong Gu, Li Zhu, and Zhihua Cai. 2009. Evaluation measures of the classification performance of imbalanced data sets. In Computational Intelligence and Intelligent Systems. Communications in Computer and Information Science, Vol. 51. Springer, 461–471.

[28]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS’16). 1–9.

[29]

Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21, 9 (2009), 1263–1284.

Digital Library

[30]

Max Hort, Jie M. Zhang, Federica Sarro, and Mark Harman. 2021. Fairea: A model behaviour mutation approach to benchmarking bias mitigation methods. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 994–1006.

Digital Library

[31]

Vasileios Iosifidis and Eirini Ntoutsi. 2020. FABBOO—Online fairness-aware learning under class imbalance. In IFIP Working Conference on Database Semantics. Springer International Publishing, Cham, Switzerland, 159–174.

[32]

Nathalie Japkowicz and Mohak Shah. 2011. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge, UK.

[33]

Faisal Kamiran and Toon Calders. 2009. Classifying without discriminating. In Proceedings of the 2009 2nd International Conference on Computer, Control, and Communication. IEEE, 1–6.

[34]

Amir E. Khandani, Adlar J. Kim, and Andrew W. Lo. 2010. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance 34, 11 (2010), 2767–2787.

[35]

Thierry Kirat, Olivia Tambou, Virginie Do, and Alexis Tsoukiàs. 2022. Fairness and explainability in automatic decision-making systems: A challenge for computer science and law. arXiv preprint arXiv:2206.03226 (2022).

[36]

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys 54, 6 (2021), 1–35.

Digital Library

[37]

Luca Oneto, Michele Donini, and Massimiliano Pontil. 2020. General fair empirical risk minimization. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN’20). 1–8.

[38]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

Digital Library

[39]

Andrea Romei and Salvatore Ruggieri. 2013. A multidisciplinary survey on discrimination analysis. Knowledge Engineering Review 29 (2013), 582–638.

[40]

Candice Schumann, Jeffrey Foster, Nicholas Mattei, and John Dickerson. 2020. We need fairness and explainability in algorithmic hiring. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’20). 1716–1720.

[41]

T. T. Soong. 2004. Fundamentals of Probability and Statistics for Engineers. Wiley, Chichester, UK.

[42]

UNESCO. 2021. Draft Recommendation on the Ethics of Artificial Intelligence. Technical Report. Document Code SHS/BIO/REC-AIETHICS/2021. UNESCO.

[43]

Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2020. Bias preservation in machine learning: The legality of fairness metrics under EU non-discrimination law. West Virginia Law Review 123 (2020), 735.

[44]

Gary Weiss. 2013. Foundations of imbalanced Learning. In Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Hoboken, NJ, USA, 13–43.

[45]

Jeannette M. Wing. 2021. Trustworthy AI. Communications of the ACM 64, 10 (2021), 64–71.

Digital Library

[46]

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In Proceedings of the International Conference on Machine Learning. 325–333.

[47]

Nengfeng Zhou, Zach Zhang, Vijayan N. Nair, Harsh Singhal, Jie Chen, and Agus Sudjianto. 2022. Bias, fairness, and accountability with AI and ML algorithms. International Statistical Review 90 (2022), 468–480.

[48]

Indrė Žliobaitė. 2017. Measuring discrimination in algorithmic decision making. Data Mining and Knowledge Discovery 31 (2017), 1060–1089.

Digital Library

Index Terms

Properties of Fairness Measures in the Context of Varying Class Imbalance and Protected Group Ratios
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Social and professional topics
  1. Computing / technology policy

Recommendations

A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance
Computational Science – ICCS 2019
Abstract
Class Imbalance problems are often encountered in many applications. Such problems occur whenever a class is under-represented, has a few data points, compared to other classes. However, this minority class is usually a significant one. One ...
Estimating harmfulness of class imbalance by scatter matrix based class separability measure

In many real world applications, class imbalance problems occur frequently, causing great underestimation for the classification performance of minority classes. In recent years, much effective solutions have been proposed to address this problem. ...
Class imbalance revisited: a new experimental setup to assess the performance of treatment methods

In the last decade, class imbalance has attracted a huge amount of attention from researchers and practitioners. Class imbalance is ubiquitous in Machine Learning, Data Mining and Pattern Recognition applications; therefore, these research communities ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 7

August 2024

505 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3613689

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2024

Online AM: 28 March 2024

Accepted: 23 March 2024

Revised: 11 February 2024

Received: 14 September 2023

Published in TKDD Volume 18, Issue 7

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Narodowe Centrum Nauki

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
208
Total Downloads

Downloads (Last 12 months)208
Downloads (Last 6 weeks)40

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents