Article

Measuring Bias in Search Results Through Retrieval List Comparison

Authors:

Simone Kopeinik,

Navid RekabsazAuthors Info & Claims

Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part V

Pages 20 - 34

https://doi.org/10.1007/978-3-031-56069-9_2

Published: 24 March 2024 Publication History

Abstract

Many IR systems project harmful societal biases, including gender bias, in their retrieved contents. Uncovering and addressing such biases requires grounded bias measurement principles. However, defining reliable bias metrics for search results is challenging, particularly due to the difficulties in capturing gender-related tendencies in the retrieved documents. In this work, we propose a new framework for search result bias measurement. Within this framework, we first revisit the current metrics for representative search result bias (RepSRB) that are based on the occurrence of gender-specific language in the search results. Addressing their limitations, we additionally propose a metric for comparative search result bias (ComSRB) measurement and integrate it into our framework. ComSRB defines bias as the skew in the set of retrieved documents in response to a non-gendered query toward those for male/female-specific variations of the same query. We evaluate ComSRB against RepSRB on a recent collection of bias-sensitive topics and documents from the MS MARCO collection, using pre-trained bi-encoder and cross-encoder IR models. Our analyses show that, while existing metrics are highly sensitive to the wordings and linguistic formulations, the proposed ComSRB metric mitigates this issue by focusing on the deviations of a retrieval list from its explicitly biased variants, avoiding the need for sub-optimal content analysis processes.

References

[1]

Bender, E.M.: On achieving and evaluating language-independence in NLP. Linguist. Issues Lang. Technol. 6 (2011)

[2]

Bigdeli, A., Arabzadeh, N., Seyedsalehi, S., Zihayat, M., Bagheri, E.: On the orthogonality of bias and utility in ad hoc retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1748–1752 (2021)

[3]

Bigdeli, A., Arabzadeh, N., Seyedsalehi, S., Zihayat, M., Bagheri, E.: A light-weight strategy for restraining gender biases in neural rankers. In: Advances in Information Retrieval, pp. 47–55 (2022)

[4]

Bigdeli A, Arabzadeh N, Zihayat M, and Bagheri E Hiemstra D, Moens M-F, Mothe J, Perego R, Potthast M, and Sebastiani F Exploring gender biases in information retrieval relevance judgement datasets Advances in Information Retrieval 2021 Cham Springer 216-224

Digital Library

[5]

Crawford, K.: The trouble with bias. In: Keynote at Annual Conference on Neural Information Processing Systems (NIPS) (2017)

[6]

Devinney, H., Björklund, J., Björklund, H.: Theories of “gender” in NLP bias research. In: 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 2083–2102 (2022)

[7]

Dixon, L., Li, J., Sorensen, J., Thain, N., Vasserman, L.: Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 67–73 (2018)

[8]

Ekstrand, M.D., Das, A., Burke, R., Diaz, F.: Fairness in information access systems. Found. Trends® Inf. Retriev. 16, 1–177 (2022)

[9]

Fabris A, Purpura A, Silvello G, and Susto GA Gender stereotype reinforcement: measuring the gender bias conveyed by ranking algorithms Inf. Process. Manag. 2020 57 6

[10]

Feng, Y., Shah, C.: Has CEO gender bias really been fixed? Adversarial attacking and improving gender fairness in image search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 11, pp. 11882–11890 (2022)

[11]

Gezici G, Lipani A, Saygin Y, and Yilmaz E Evaluation metrics for measuring bias in search engine results Inf. Retrieval J. 2021 24 2 85-113

Digital Library

[12]

Kay, M., Matuszek, C., Munson, S.A.: Unequal representation and gender stereotypes in image search results for occupations. In: Proceedings of the Annual ACM Conference on Human Factors in Computing Systems, pp. 3819–3828 (2015)

[13]

Kopeinik, S., Mara, M., Ratz, L., Krieg, K., Schedl, M., Rekabsaz, N.: Show me a “Male Nurse”! how gender bias is reflected in the query formulation of search engine users. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI 2023), pp. 1–15 (2023)

[14]

Krieg, K., Parada-Cabaleiro, E., Medicus, G., Lesota, O., Schedl, M., Rekabsaz, N.: Grep-BiasIR: a dataset for investigating gender representation bias in information retrieval results. In: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval (CHIIR 2023), pp. 444–448 (2023)

[15]

Li, Y., et al.: Debiasing neural retrieval via in-batch balancing regularization. In: Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pp. 58–66 (2022)

[16]

Lin, J., Ma, X., Lin, S.C., Yang, J.H., Pradeep, R., Nogueira, R.: Pyserini: a Python toolkit for reproducible information retrieval research with sparse and dense representations. In: Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pp. 2356–2362 (2021)

[17]

Maughan, K., Near, J.P.: Towards a measure of individual fairness for deep learning. arXiv e-prints pp. arXiv-2009 (2020)

[18]

Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Proceedings of the Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 9 December 2016. CEUR Workshop Proceedings, vol. 1773 (2016)

[19]

Otterbacher, J., Bates, J., Clough, P.: Competent men and warm women: gender stereotypes and backlash in image search results. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 6620–6631 (2017)

[20]

Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 3982–3992 (2019)

[21]

Rekabsaz, N., Kopeinik, S., Schedl, M.: Societal biases in retrieved contents: measurement framework and adversarial mitigation of BERT rankers. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 306–316 (2021)

[22]

Rekabsaz, N., Schedl, M.: Do neural ranking models intensify gender bias? In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2065–2068 (2020)

[23]

Rosch EH Natural categories Cogn. Psychol. 1973 4 3 328-350

[24]

Vlasceanu M and Amodio DM Propagation of societal gender inequality by internet search algorithms Proc. Natl. Acad. Sci. 2022 119 29

[25]

Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: Advances in Neural Information Processing Systems, vol. 33, pp. 5776–5788 (2020)

[26]

Zehlike, M., Yang, K., Stoyanovich, J.: Fairness in ranking: a survey. arXiv preprint arXiv:2103.14000 (2021)

[27]

Zerveas, G., Rekabsaz, N., Cohen, D., Eickhoff, C.: Mitigating bias in search results through contextual document reranking and neutrality regularization. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2532–2538 (2022)

Recommendations

On the Characteristics of Ranking-based Gender Bias Measures
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

With increased recent awareness on the possible impact of retrieval techniques on intensifying gender biases, researchers have embarked on defining quantifiable gender bias metrics that can provide the means to concretely measure such biases in practice. ...
Viewpoint Diversity in Search Results
Advances in Information Retrieval
Abstract
Adverse phenomena such as the search engine manipulation effect (SEME), where web search users change their attitude on a topic following whatever most highly-ranked search results promote, represent crucial challenges for research and industry. ...
Quantifying retrieval bias in Web archive search

A Web archive usually contains multiple versions of documents crawled from the Web at different points in time. One possible way for users to access a Web archive is through full-text search systems. However, previous studies have shown that these ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part V

Mar 2024

528 pages

ISBN:978-3-031-56068-2

DOI:10.1007/978-3-031-56069-9

Editors:
Nazli Goharian
https://ror.org/05vzafd60Georgetown University, Washington, WA, USA
,
Nicola Tonellotto
https://ror.org/03ad39j10University of Pisa, PISA, Pisa, Italy
,
Yulan He
https://ror.org/0220mzb33King's College London, London, UK
,
Aldo Lipani
https://ror.org/02jx3x895University College London, London, UK
,
Graham McDonald
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK
,
Craig Macdonald
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK
,
Iadh Ounis
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 24 March 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten