research-article

Automatic annotation of protected attributes to support fairness optimization

Authors:

Juan Pablo Consuegra-Ayala,

Yoan Gutiérrez,

Yudivian Almeida-Cruz,

Manuel PalomarAuthors Info & Claims

Volume 663, Issue C

https://doi.org/10.1016/j.ins.2024.120188

Published: 25 June 2024 Publication History

Abstract

Recent research has shown that the unaware automation of high-risk decision-making tasks can result in unfair decisions being made. The most common approaches to address this problem adopt definitions of fairness based on protected attributes. Precise annotation of protected attributes enables the application of bias mitigation techniques to commonly unlabeled kinds of data (e.g., images, text, etc.). This paper proposes a framework to automatically annotate protected attributes in data collections. The framework focuses on providing a single interface to annotate protected attributes of different types (e.g., gender, race, etc.) and from different kinds of data. Internally, the framework coordinates multiple sensors to produce the final annotation. Several sensors for textual data are proposed. An optimization search technique is designed to tune the framework to specific domains. Additionally, a small dataset of movie reviews —annotated with gender and sentiment— was created. The evaluation in datasets of texts from diverse domains shows the quality of the annotations and their effectiveness to be used as a proxy to estimate fairness in datasets and machine learning models. The source code is available online for the research community.

Highlights

•

A framework to automatically annotate protected attributes in datasets.

•

Techniques to annotate gender in textual collections with fairness considerations.

•

Optimization search approach for tuning the framework to custom domains.

•

Small dataset of movie reviews annotated with gender and sentiment.

•

Effective use of annotations as a proxy to estimate fairness in datasets.

References

[1]

F.T. Asr, M. Mazraeh, A. Lopes, V. Gautam, J. Gonzales, P. Rao, M. Taboada, The gender gap tracker: using natural language processing to measure gender bias in media, PLoS ONE 16 (1) (2021).

[2]

R.K. Bellamy, K. Dey, M. Hind, S.C. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilović, et al., AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias, IBM J. Res. Dev. 63 (4/5) (2019).

[3]

Bolukbasi, T.; Chang, K.-W.; Zou, J.; Saligrama, V.; Kalai, A. (2016): Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. arXiv preprint arXiv:1607.06520.

[4]

T. Brennan, W. Dieterich, B. Ehret, Evaluating the predictive validity of the compas risk and needs assessment system, Crim. Justice Behav. 36 (1) (2009) 21–40.

[5]

F.Z. Canal, T.R. Müller, J.C. Matias, G.G. Scotton, A.R. de Sa Junior, E. Pozzebon, A.C. Sobieranski, A survey on facial emotion recognition techniques: a state-of-the-art literature review, Inf. Sci. 582 (2022) 593–617.

Digital Library

[6]

N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357.

[7]

Consuegra-Ayala, J.P. (July 2023): bfair-ml/reviews-gender-corpus: Version 2023.07.0. https://doi.org/10.5281/zenodo.8113901.

[8]

J.P. Consuegra-Ayala, Y. Gutiérrez, Y. Almeida-Cruz, M. Palomar, Intelligent ensembling of auto-ML system outputs for solving classification problems, Inf. Sci. (ISSN ) 609 (2022) 766–780,. https://www.sciencedirect.com/science/article/pii/S0020025522007502.

Digital Library

[9]

C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297.

[10]

K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197.

Digital Library

[11]

Dinan, E.; Fan, A.; Wu, L.; Weston, J.; Kiela, D.; Williams, A. (2020): Multi-dimensional gender bias classification. arXiv preprint arXiv:2005.00614.

[12]

Donini, M.; Oneto, L.; Ben-David, S.; Shawe-Taylor, J.; Pontil, M. (2018): Empirical risk minimization under fairness constraints. arXiv preprint arXiv:1802.08626.

[13]

Dua, D.; Graff, C. (2017): UCI machine learning repository. http://archive.ics.uci.edu/ml.

[14]

E. Eidinger, R. Enbar, T. Hassner, Age and gender estimation of unfiltered faces, IEEE Trans. Inf. Forensics Secur. 9 (12) (2014) 2170–2179.

Digital Library

[15]

Equality and Human Rights Commission : Equality act: protected characteristics. https://www.equalityhumanrights.com/en/equality-act/protected-characteristics.

[16]

S. Estevez-Velarde, A. Piad-Morffis, Y. Gutiérrez, A. Montoyo, R. Munoz, Y. Almeida-Cruz, Solving heterogeneous AutoML problems with AutoGOAL, in: ICML Workshop on Automated Machine Learning (AutoML@ ICML), 2020.

[17]

S. Estevez-Velarde, Y. Gutiérrez, Y. Almeida-Cruz, A. Montoyo, General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution, Inf. Sci. (ISSN ) 543 (2021) 58–71,. https://www.sciencedirect.com/science/article/pii/S0020025520306988.

[18]

G. Farnadi, The AI industry through the lens of ethics and fairness, in: Missing Links in AI Governance, 2023, p. 27.

[19]

A. Ghildiyal, S. Sharma, I. Verma, U. Marhatta, Age and gender predictions using artificial intelligence algorithm, in: 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), IEEE, 2020, pp. 371–375.

[20]

H. Hirota, N. Kertkeidkachorn, K. Shirai, Weakly-supervised multimodal learning for predicting the gender of Twitter users, in: International Conference on Applications of Natural Language to Information Systems, Springer, 2023, pp. 522–532.

[21]

Humeau, S.; Shuster, K.; Lachaux, M.-A.; Poly-encoders, J. Weston (2019): Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv preprint arXiv:1905.01969.

[22]

A.E. Khandani, A.J. Kim, A.W. Lo, Consumer credit-risk models via machine-learning algorithms, J. Bank. Finance 34 (11) (2010) 2767–2787.

[23]

H.-T. Kim, C.W. Ahn, A new grammatical evolution based on probabilistic context-free grammar, in: Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems-Volume 2, Springer, 2015, pp. 1–12.

[24]

M. Lavanchy, P. Reichert, J. Narayanan, K. Savani, Applicants' fairness perceptions of algorithm-driven hiring procedures, J. Bus. Ethics (2023) 1–26.

[25]

J. Lee, D. Lee, Y.-C. Lee, W.-S. Hwang, S.-W. Kim, Improving the accuracy of top-N recommendation using a preference model, Inf. Sci. 348 (2016) 290–304.

[26]

D. Li, Y. Zhou, Z. Wang, D. Gao, Exploiting the potentialities of features for speech emotion recognition, Inf. Sci. 548 (2021) 328–343.

[27]

A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y. Ng, C. Potts, Learning word vectors for sentiment analysis, in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, Association for Computational Linguistics, June 2011, pp. 142–150. http://www.aclweb.org/anthology/P11-1015.

[28]

J.F. Mahoney, J.M. Mohen, Method and system for loan origination and underwriting, Oct. 23 2007, US Patent 7,287,008.

[29]

N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, A. Galstyan, A survey on bias and fairness in machine learning, ACM Comput. Surv. 54 (6) (2021) 1–35.

Digital Library

[30]

A.A. Morgan-Lopez, A.E. Kim, R.F. Chew, P. Ruddle, Predicting age groups of Twitter users based on language and metadata features, PLoS ONE 12 (8) (2017).

[31]

Pagano, T.P.; Loureiro, R.B.; Araujo, M.M.; Lisboa, F.V.N.; Peixoto, R.M.; Guimaraes, G.A.d.S.; Santos, L.L.d.; Cruz, G.O.R.; de Oliveira, E.L.S.; Cruz, M.; et al. (2022): Bias and unfairness in machine learning models: a systematic literature review. arXiv preprint arXiv:2202.08176.

[32]

V. Perrone, M. Donini, M.B. Zafar, R. Schmucker, K. Kenthapadi, C. Archambeau, Fair Bayesian optimization, in: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 854–863.

[33]

A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in: International Conference on Machine Learning, PMLR, 2021, pp. 8748–8763.

[34]

G. Rajendra, K. Sumanth, C. Anjali, G. Pardhasai, M. Supraja, Gender prediction using deep learning algorithms and model on images of an individual, Journal of Physics: Conference Series, vol. 1998, IOP Publishing, 2021.

[35]

Saleiro, P.; Kuester, B.; Hinkson, L.; London, J.; Stevens, A.; Anisfeld, A.; Rodolfa, K.T.; Ghani, R. (2018): Aequitas: a bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577.

[36]

M.K. Scheuerman, J.M. Paul, J.R. Brubaker, How computers see gender: an evaluation of gender classification in commercial facial analysis services, Proc. ACM Hum.-Comput. Interact. 3 (CSCW) (2019) 1–33.

[37]

V. Sheoran, S. Joshi, T.R. Bhayani, Age and gender prediction using deep CNNs and transfer learning, in: Computer Vision and Image Processing: 5th International Conference, CVIP 2020, Prayagraj, India, December 4-6, 2020, Revised Selected Papers, Part II 5, Springer, 2021, pp. 293–304.

[38]

Soumah, V.-G.; Rao, P.; Eibl, P.; Taboada, M. (2023): Radar de parité: an NLP system to measure gender representation in French news stories. arXiv preprint arXiv:2304.09982.

[39]

S. Verma, J. Rubin, Fairness definitions explained, in: 2018 IEEE/ACM International Workshop on Software Fairness (Fairware), IEEE, 2018, pp. 1–7.

[40]

H. Weerts, M. Dudík, R. Edgar, A. Jalali, R. Lutz, M. Madaio, Fairlearn: assessing and improving fairness of AI systems, 2023.

[41]

White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. (2023): A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382.

[42]

M.B. Zafar, I. Valera, M.G. Rogriguez, K.P. Gummadi, Fairness constraints: mechanisms for fair classification, in: Artificial Intelligence and Statistics, PMLR, 2017, pp. 962–970.

[43]

B.H. Zhang, B. Lemoine, M. Mitchell, Mitigating unwanted biases with adversarial learning, in: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 335–340.

[44]

Q. Zhang, J. Liu, Z. Zhang, J. Wen, B. Mao, X. Yao, Mitigating unfairness via evolutionary multi-objective ensemble learning, in: IEEE Transactions on Evolutionary Computation, 2022.

Recommendations

Improving completeness and consistency of co-reference annotation standard
Abstract
As the processing power of mobile terminals increases, wireless network applications such as voice assistants can put more context-sensitive tasks on the mobile terminals, thus reducing the wireless network bandwidth needed and the cost of data ...
Semi-automatic semantic annotation of PubMed queries

Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a ...
Desiderata for ontologies to be used in semantic annotation of biomedical documents

A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to ...

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 663, Issue C

Mar 2024

544 pages

Issue’s Table of Contents

Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 25 June 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents