Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automatic annotation of protected attributes to support fairness optimization

Published: 25 June 2024 Publication History

Abstract

Recent research has shown that the unaware automation of high-risk decision-making tasks can result in unfair decisions being made. The most common approaches to address this problem adopt definitions of fairness based on protected attributes. Precise annotation of protected attributes enables the application of bias mitigation techniques to commonly unlabeled kinds of data (e.g., images, text, etc.). This paper proposes a framework to automatically annotate protected attributes in data collections. The framework focuses on providing a single interface to annotate protected attributes of different types (e.g., gender, race, etc.) and from different kinds of data. Internally, the framework coordinates multiple sensors to produce the final annotation. Several sensors for textual data are proposed. An optimization search technique is designed to tune the framework to specific domains. Additionally, a small dataset of movie reviews —annotated with gender and sentiment— was created. The evaluation in datasets of texts from diverse domains shows the quality of the annotations and their effectiveness to be used as a proxy to estimate fairness in datasets and machine learning models. The source code is available online for the research community.

Highlights

A framework to automatically annotate protected attributes in datasets.
Techniques to annotate gender in textual collections with fairness considerations.
Optimization search approach for tuning the framework to custom domains.
Small dataset of movie reviews annotated with gender and sentiment.
Effective use of annotations as a proxy to estimate fairness in datasets.

References

[1]
F.T. Asr, M. Mazraeh, A. Lopes, V. Gautam, J. Gonzales, P. Rao, M. Taboada, The gender gap tracker: using natural language processing to measure gender bias in media, PLoS ONE 16 (1) (2021).
[2]
R.K. Bellamy, K. Dey, M. Hind, S.C. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilović, et al., AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias, IBM J. Res. Dev. 63 (4/5) (2019).
[3]
Bolukbasi, T.; Chang, K.-W.; Zou, J.; Saligrama, V.; Kalai, A. (2016): Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. arXiv preprint arXiv:1607.06520.
[4]
T. Brennan, W. Dieterich, B. Ehret, Evaluating the predictive validity of the compas risk and needs assessment system, Crim. Justice Behav. 36 (1) (2009) 21–40.
[5]
F.Z. Canal, T.R. Müller, J.C. Matias, G.G. Scotton, A.R. de Sa Junior, E. Pozzebon, A.C. Sobieranski, A survey on facial emotion recognition techniques: a state-of-the-art literature review, Inf. Sci. 582 (2022) 593–617.
[6]
N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357.
[7]
Consuegra-Ayala, J.P. (July 2023): bfair-ml/reviews-gender-corpus: Version 2023.07.0. https://doi.org/10.5281/zenodo.8113901.
[8]
J.P. Consuegra-Ayala, Y. Gutiérrez, Y. Almeida-Cruz, M. Palomar, Intelligent ensembling of auto-ML system outputs for solving classification problems, Inf. Sci. (ISSN ) 609 (2022) 766–780,. https://www.sciencedirect.com/science/article/pii/S0020025522007502.
[9]
C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297.
[10]
K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197.
[11]
Dinan, E.; Fan, A.; Wu, L.; Weston, J.; Kiela, D.; Williams, A. (2020): Multi-dimensional gender bias classification. arXiv preprint arXiv:2005.00614.
[12]
Donini, M.; Oneto, L.; Ben-David, S.; Shawe-Taylor, J.; Pontil, M. (2018): Empirical risk minimization under fairness constraints. arXiv preprint arXiv:1802.08626.
[13]
Dua, D.; Graff, C. (2017): UCI machine learning repository. http://archive.ics.uci.edu/ml.
[14]
E. Eidinger, R. Enbar, T. Hassner, Age and gender estimation of unfiltered faces, IEEE Trans. Inf. Forensics Secur. 9 (12) (2014) 2170–2179.
[15]
Equality and Human Rights Commission : Equality act: protected characteristics. https://www.equalityhumanrights.com/en/equality-act/protected-characteristics.
[16]
S. Estevez-Velarde, A. Piad-Morffis, Y. Gutiérrez, A. Montoyo, R. Munoz, Y. Almeida-Cruz, Solving heterogeneous AutoML problems with AutoGOAL, in: ICML Workshop on Automated Machine Learning (AutoML@ ICML), 2020.
[17]
S. Estevez-Velarde, Y. Gutiérrez, Y. Almeida-Cruz, A. Montoyo, General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution, Inf. Sci. (ISSN ) 543 (2021) 58–71,. https://www.sciencedirect.com/science/article/pii/S0020025520306988.
[18]
G. Farnadi, The AI industry through the lens of ethics and fairness, in: Missing Links in AI Governance, 2023, p. 27.
[19]
A. Ghildiyal, S. Sharma, I. Verma, U. Marhatta, Age and gender predictions using artificial intelligence algorithm, in: 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), IEEE, 2020, pp. 371–375.
[20]
H. Hirota, N. Kertkeidkachorn, K. Shirai, Weakly-supervised multimodal learning for predicting the gender of Twitter users, in: International Conference on Applications of Natural Language to Information Systems, Springer, 2023, pp. 522–532.
[21]
Humeau, S.; Shuster, K.; Lachaux, M.-A.; Poly-encoders, J. Weston (2019): Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv preprint arXiv:1905.01969.
[22]
A.E. Khandani, A.J. Kim, A.W. Lo, Consumer credit-risk models via machine-learning algorithms, J. Bank. Finance 34 (11) (2010) 2767–2787.
[23]
H.-T. Kim, C.W. Ahn, A new grammatical evolution based on probabilistic context-free grammar, in: Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems-Volume 2, Springer, 2015, pp. 1–12.
[24]
M. Lavanchy, P. Reichert, J. Narayanan, K. Savani, Applicants' fairness perceptions of algorithm-driven hiring procedures, J. Bus. Ethics (2023) 1–26.
[25]
J. Lee, D. Lee, Y.-C. Lee, W.-S. Hwang, S.-W. Kim, Improving the accuracy of top-N recommendation using a preference model, Inf. Sci. 348 (2016) 290–304.
[26]
D. Li, Y. Zhou, Z. Wang, D. Gao, Exploiting the potentialities of features for speech emotion recognition, Inf. Sci. 548 (2021) 328–343.
[27]
A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y. Ng, C. Potts, Learning word vectors for sentiment analysis, in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, Association for Computational Linguistics, June 2011, pp. 142–150. http://www.aclweb.org/anthology/P11-1015.
[28]
J.F. Mahoney, J.M. Mohen, Method and system for loan origination and underwriting, Oct. 23 2007, US Patent 7,287,008.
[29]
N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, A. Galstyan, A survey on bias and fairness in machine learning, ACM Comput. Surv. 54 (6) (2021) 1–35.
[30]
A.A. Morgan-Lopez, A.E. Kim, R.F. Chew, P. Ruddle, Predicting age groups of Twitter users based on language and metadata features, PLoS ONE 12 (8) (2017).
[31]
Pagano, T.P.; Loureiro, R.B.; Araujo, M.M.; Lisboa, F.V.N.; Peixoto, R.M.; Guimaraes, G.A.d.S.; Santos, L.L.d.; Cruz, G.O.R.; de Oliveira, E.L.S.; Cruz, M.; et al. (2022): Bias and unfairness in machine learning models: a systematic literature review. arXiv preprint arXiv:2202.08176.
[32]
V. Perrone, M. Donini, M.B. Zafar, R. Schmucker, K. Kenthapadi, C. Archambeau, Fair Bayesian optimization, in: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 854–863.
[33]
A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in: International Conference on Machine Learning, PMLR, 2021, pp. 8748–8763.
[34]
G. Rajendra, K. Sumanth, C. Anjali, G. Pardhasai, M. Supraja, Gender prediction using deep learning algorithms and model on images of an individual, Journal of Physics: Conference Series, vol. 1998, IOP Publishing, 2021.
[35]
Saleiro, P.; Kuester, B.; Hinkson, L.; London, J.; Stevens, A.; Anisfeld, A.; Rodolfa, K.T.; Ghani, R. (2018): Aequitas: a bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577.
[36]
M.K. Scheuerman, J.M. Paul, J.R. Brubaker, How computers see gender: an evaluation of gender classification in commercial facial analysis services, Proc. ACM Hum.-Comput. Interact. 3 (CSCW) (2019) 1–33.
[37]
V. Sheoran, S. Joshi, T.R. Bhayani, Age and gender prediction using deep CNNs and transfer learning, in: Computer Vision and Image Processing: 5th International Conference, CVIP 2020, Prayagraj, India, December 4-6, 2020, Revised Selected Papers, Part II 5, Springer, 2021, pp. 293–304.
[38]
Soumah, V.-G.; Rao, P.; Eibl, P.; Taboada, M. (2023): Radar de parité: an NLP system to measure gender representation in French news stories. arXiv preprint arXiv:2304.09982.
[39]
S. Verma, J. Rubin, Fairness definitions explained, in: 2018 IEEE/ACM International Workshop on Software Fairness (Fairware), IEEE, 2018, pp. 1–7.
[40]
H. Weerts, M. Dudík, R. Edgar, A. Jalali, R. Lutz, M. Madaio, Fairlearn: assessing and improving fairness of AI systems, 2023.
[41]
White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. (2023): A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382.
[42]
M.B. Zafar, I. Valera, M.G. Rogriguez, K.P. Gummadi, Fairness constraints: mechanisms for fair classification, in: Artificial Intelligence and Statistics, PMLR, 2017, pp. 962–970.
[43]
B.H. Zhang, B. Lemoine, M. Mitchell, Mitigating unwanted biases with adversarial learning, in: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 335–340.
[44]
Q. Zhang, J. Liu, Z. Zhang, J. Wen, B. Mao, X. Yao, Mitigating unfairness via evolutionary multi-objective ensemble learning, in: IEEE Transactions on Evolutionary Computation, 2022.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 663, Issue C
Mar 2024
544 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 25 June 2024

Author Tags

  1. Fairness
  2. Gender bias
  3. Natural language processing
  4. Corpus
  5. Optimization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media