research-article

Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning Pipelines

Authors:

John P. Lalor,

Ahmed Abbasi,

Kezia Oketch,

Yi Yang,

Nicole ForsgrenAuthors Info & Claims

ACM Transactions on Information Systems, Volume 42, Issue 4

Article No.: 99, Pages 1 - 41

https://doi.org/10.1145/3641276

Published: 22 March 2024 Publication History

Get Access

Abstract

Fairness measurement is crucial for assessing algorithmic bias in various types of machine learning (ML) models, including ones used for search relevance, recommendation, personalization, talent analytics, and natural language processing. However, the fairness measurement paradigm is currently dominated by fairness metrics that examine disparities in allocation and/or prediction error as univariate key performance indicators (KPIs) for a protected attribute or group. Although important and effective in assessing ML bias in certain contexts such as recidivism, existing metrics don’t work well in many real-world applications of ML characterized by imperfect models applied to an array of instances encompassing a multivariate mixture of protected attributes, that are part of a broader process pipeline. Consequently, the upstream representational harm quantified by existing metrics based on how the model represents protected groups doesn’t necessarily relate to allocational harm in the application of such models in downstream policy/decision contexts. We propose FAIR-Frame, a model-based framework for parsimoniously modeling fairness across multiple protected attributes in regard to the representational and allocational harm associated with the upstream design/development and downstream usage of ML models. We evaluate the efficacy of our proposed framework on two testbeds pertaining to text classification using pretrained language models. The upstream testbeds encompass over fifty thousand documents associated with twenty-eight thousand users, seven protected attributes and five different classification tasks. The downstream testbeds span three policy outcomes and over 5.41 million total observations. Results in comparison with several existing metrics show that the upstream representational harm measures produced by FAIR-Frame and other metrics are significantly different from one another, and that FAIR-Frame’s representational fairness measures have the highest percentage alignment and lowest error with allocational harm observed in downstream applications. Our findings have important implications for various ML contexts, including information retrieval, user modeling, digital platforms, and text classification, where responsible and trustworthy AI is becoming an imperative.

References

[1]

Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Transactions on Information Systems 26, 3 (2008), 1–34.

Abstract

References

Cited By

Index Terms

Recommendations

Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned

Silva: Interactively Assessing Machine Learning Fairness Using Causality

Framework for Bias Detection in Machine Learning Models: A Fairness Approach

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Full Text

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations