short-paper

Open access

Language Fairness in Multilingual Information Retrieval

Authors:

Thomas Jänich,

James Mayfield,

Dawn LawrieAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2487 - 2491

https://doi.org/10.1145/3626772.3657943

Published: 11 July 2024 Publication History

Abstract

Multilingual information retrieval (MLIR) considers the problem of ranking documents in several languages for a query expressed in a language that may differ from any of those languages. Recent work has observed that approaches such as combining ranked lists representing a single document language each or using multilingual pretrained language models demonstrate a preference for one language over others. This results in systematic unfair treatment of documents in different languages. This work proposes a language fairness metric to evaluate whether documents across different languages are fairly ranked through statistical equivalence testing using the Kruskal-Wallis test. In contrast to most prior work in group fairness, we do not consider any language to be an unprotected group. Thus our proposed measure, PEER (Probability of Equal Expected Rank), is the first fairness metric specifically designed to capture the language fairness of MLIR systems. We demonstrate the behavior of PEER on artificial ranked lists. We also evaluate real MLIR systems on two publicly available benchmarks and show that the PEER scores align with prior analytical findings on MLIR fairness. Our implementation is compatible with ir-measures and is available at http://github.com/hltcoe/peer_measure.

References

[1]

Jaime Arguello, Adam Ferguson, Emery Fine, Bhaskar Mitra, Hamed Zamani, and Fernando Diaz. 2021. Tip of the tongue known-item retrieval: A case study in movie identification. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval. 5--14.

Digital Library

[2]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. 2016. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016).

[3]

Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. 2018. Equity of attention: Amortizing individual fairness in rankings. In Proc. SIGIR. 405--414.

Digital Library

[4]

Martin Braschler. 2001. CLEF 2001-Overview of Results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 9--26.

[5]

Martin Braschler. 2002. CLEF 2002-Overview of results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 9--27.

[6]

Martin Braschler. 2003. CLEF 2003--Overview of results. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, 44--63.

[7]

Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM New York, NY, USA, 3--10.

[8]

Carlos Castillo. 2019. Fairness and transparency in ranking. In ACM SIGIR Forum, Vol. 52. ACM New York, NY, USA, 64--71.

Digital Library

[9]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. 621--630.

Digital Library

[10]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to answer open-domain questions. In 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017. Association for Computational Linguistics (ACL), 1870--1879.

[11]

Monojit Choudhury and Amit Deshpande. 2021. How Linguistically Fair Are Multilingual Pre-Trained Language Models?. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 12710--12718.

[12]

Jonathan H. Clark, Eunsol Choi, Michael Collins, Dan Garrette, Tom Kwiatkowski, Vitaly Nikolaev, and Jennimaria Palomaki. 2020. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages. Transactions of the Association for Computational Linguistics (2020).

[13]

Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan B"uttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR.

[14]

Fernando Diaz, Bhaskar Mitra, Michael D Ekstrand, Asia J Biega, and Ben Carterette. 2020. Evaluating stochastic rankings with expected exposure. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 275--284.

Digital Library

[15]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214--226.

Digital Library

[16]

Zhiqi Huang, Hansi Zeng, Hamed Zamani, and James Allan. 2023. Soft Prompt Decoding for Multilingual Dense Retrieval. arXiv preprint arXiv:2305.09025 (2023).

[17]

Nora Kassner, Philipp Dufter, and Hinrich Schütze. 2021. Multilingual LAMA: Investigating knowledge in multilingual pretrained language models. arXiv preprint arXiv:2102.00894 (2021).

[18]

William H Kruskal and W Allen Wallis. 1952. Use of ranks in one-criterion variance analysis. Journal of the American statistical Association, Vol. 47, 260 (1952), 583--621.

[19]

Heeyoung Kwon, Harsh Trivedi, Peter Jansen, Mihai Surdeanu, and Niranjan Balasubramanian. 2018. Controlling information aggregation for complex question answering. In Advances in Information Retrieval: 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26--29, 2018, Proceedings 40. Springer, 750--757.

[20]

Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, and Eugene Yang. 2023 a. Overview of the TREC 2022 NeuCLIR Track. In Proceedings of the 31st Text REtrieval Conference (Gaithersburg, Maryland). https://arxiv.org/abs/2304.12367

[21]

Dawn Lawrie, Eugene Yang, Douglas W Oard, and James Mayfield. 2023 b. Neural Approaches to Multilingual Information Retrieval. In European Conference on Information Retrieval. Springer, 521--536.

Digital Library

[22]

Jin Ha Lee, Allen Renear, and Linda C Smith. 2006. Known-item search: Variations on a concept. Proceedings of the american society for information science and technology, Vol. 43, 1 (2006), 1--17.

[23]

Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS), Vol. 27, 1 (2008), 1--27.

Digital Library

[24]

Marco Morik, Ashudeep Singh, Jessica Hong, and Thorsten Joachims. 2020. Controlling fairness and bias in dynamic learning-to-rank. In Proc. SIGIR. 429--438.

Digital Library

[25]

Razieh Rahimi, Azadeh Shakery, and Irwin King. 2015. Multilingual information retrieval in the language modeling framework. Information Retrieval Journal, Vol. 18, 3 (2015), 246--281.

Digital Library

[26]

Stephen E Robertson and Ian Soboroff. 2002. The TREC 2002 Filtering Track Report. In Proceedings of the Eleventh Text Retrieval Conference.

[27]

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. 2022. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3715--3734.

[28]

Piotr Sapiezynski, Wesley Zeng, Ronald E Robertson, Alan Mislove, and Christo Wilson. 2019. Quantifying the impact of user attention on fair group representation in ranked lists. In Companion Proceedings of WWW. 553--562.

[29]

Stefan Schulz-Hardt, Dieter Frey, Carsten Lüthgens, and Serge Moscovici. 2000. Biased information search in group decision making. Journal of personality and social psychology, Vol. 78, 4 (2000), 655.

[30]

Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proc. KDD.

Digital Library

[31]

Ryen White. 2013. Beliefs and biases in web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 3--12.

Digital Library

[32]

Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. Fa* ir: A fair top-k ranking algorithm. In Proc. of CIKM.

Digital Library

[33]

Meike Zehlike, Tom Sühr, Ricardo Baeza-Yates, Francesco Bonchi, Carlos Castillo, and Sara Hajian. 2022. Fair Top-k Ranking with multiple protected groups. Information processing & management, Vol. 59, 1 (2022), 102707.

[34]

Meike Zehlike, Ke Yang, and Julia Stoyanovich. 2021. Fairness in ranking: A survey. arXiv preprint arXiv:2103.14000 (2021).

Index Terms

Language Fairness in Multilingual Information Retrieval
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Specialized information retrieval
      1. Structure and multilingual text search
        Multilingual and cross-lingual retrieval

Recommendations

Distillation for Multilingual Information Retrieval
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Recent work in cross-language information retrieval (CLIR), where queries and documents are in different languages, has shown the benefit of the Translate-Distill framework that trains a cross-language neural dual-encoder model using translation and ...
Proper nouns in English–Arabic cross language information retrieval

Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do ...
Cross-language information retrieval with latent topic models trained on a comparable corpus
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

In this paper we study cross-language information retrieval using a bilingual topic model trained on comparable corpora such as Wikipedia articles. The bilingual Latent Dirichlet Allocation model (BiLDA) creates an interlingual representation, which can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
220
Total Downloads

Downloads (Last 12 months)220
Downloads (Last 6 weeks)50

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten